hansenlab / minfi

Devel repository for minfi
58 stars 70 forks source link

readGEORawFile() should return a RGChannelSet object #172

Open jordimartorell opened 6 years ago

jordimartorell commented 6 years ago

Hello and thank you for this great package.

I am working with raw 450k data from GEO. readGEORawFile() function is very useful to read the signal intensities files provided by GEO, which are the way used by GEO to provide raw data. However, I wonder why readGEORawFile() returns a GenomicMethylSet object instead of a RGChannelSet object.

If I understood well the documentation, GenomicMethylSet is a class designed to store processed data, but we are reading raw GEO data. This is a problem because I can't use functions like preprocessNoob() or detectionP() in order to process these raw data. In my opinion, it would be much more convenient if readGEORawFile() could return a RGChannelSet object.

Thanks in advance! Jordi

kasperdanielhansen commented 6 years ago

But the raw signal intensities provided by GEO are not the same as the (even lower level) data contained in the IDAT files. If the IDAT files are available you should download those and parse them. It might be possible to automate this process.

Best, Kasper

On Wed, Aug 8, 2018 at 7:59 AM jordimartorell notifications@github.com wrote:

Hello and thank you for this great package.

I am working with raw 450k data from GEO. readGEORawFile() function is very useful to read the signal intensities files provided by GEO, which are the way used by GEO to provide raw data. However, I wonder why readGEORawFile() returns a GenomicMethylSet object instead of a RGChannelSet object.

If I understood well the documentation, GenomicMethylSet is a class designed to store processed data, but we are reading raw GEO data. This is a problem because I can't use functions like preprocessNoob() or detectionP() in order to process these raw data. In my opinion, it would be much more convenient if readGEORawFile() could return a RGChannelSet object.

Thanks in advance! Jordi

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hansenlab/minfi/issues/172, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuhn1CqK3XuAq2eKI8jsxIVdOJJVkHDks5uOtK-gaJpZM4VzzCg .

jordimartorell commented 6 years ago

Thanks for your response @kasperdanielhansen. Unfortunately, idat files are not usually provided, so we have to start our analysis from the signal intensities files, which I think are the mandatory raw files to upload 450k data to GEO.

kasperdanielhansen commented 6 years ago

Yeah, so the issue is that these signal intensities are not all the data. They don't include

Best, Kasper

On Wed, Aug 8, 2018 at 9:47 AM jordimartorell notifications@github.com wrote:

Thanks for your response @kasperdanielhansen https://github.com/kasperdanielhansen. Unfortunately, idat files are not usually provided, so we have to start our analysis from the signal intensities files, which I think are the mandatory raw files to upload 450k data to GEO.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hansenlab/minfi/issues/172#issuecomment-411411163, or mute the thread https://github.com/notifications/unsubscribe-auth/AEuhnwfb_Qz8LCyX_4BzprvAEgf4Y8IPks5uOuvcgaJpZM4VzzCg .

jordimartorell commented 6 years ago

Then I suppose nothing can be done. I don't understand why GEO doesn't require to upload IDAT files, which are the necessary data to reproduce any analysis. Anyway, thank you very much for your explanations, Kasper.

Best. Jordi

oleksii-nikolaienko commented 4 years ago

Hi, I'd also like to thank for a great package and ask similar question: I'm starting analysis with text file containing raw values, because IDAT files are not available (GSE40279). After reading with readGEORawFile it is not possible to do preprocessQuantile, which returns:

geo.raw <- readGEORawFile(filename="GSE40279_signal_intensities.txt", sep="\t", Uname="SignalA", Mname="SignalB", row.names=2)
genomic.ratio.set   <- preprocessQuantile(geo.raw, mergeManifest=TRUE, fixOutliers=FALSE)
[preprocessQuantile] Mapping to genome.
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In preprocessQuantile(idat.data, mergeManifest = TRUE, fixOutliers = FALSE) :
  preprocessQuantile has only been tested with 'preprocessRaw'

Is it because preprocessQuantile requires IDAT files or because something is wrong with that text file? Of note, there are no NA values in either Meth or Unmeth, but maybe preprocessQuantile somewhere computes beta values without offset, which results in NaNs?

Best, Oleksii