perishky / meffil

Efficient algorithms for analyzing DNA methylation data.
Artistic License 2.0
55 stars 28 forks source link

Extracting genotypes from Methylation data #28

Closed CornelisB closed 4 years ago

CornelisB commented 4 years ago

Thanks for having such a clear package and workflow. I was wondering is it also possible to just extract genotypes from the methylation data without having to input genotypes derived from another source?

perishky commented 4 years ago

I'm not sure what you mean by "input genotypes derived from another source"? The microarrays have probes that measure signal for just over 60 SNPs. These signals can be retreived from the QC objects using the meffil.snp.betas() function: snp.betas <- meffil.snp.betas(qc.objects) You could then use our internal function to estimate genotypes from those signals: genotypes <- meffil:::calculate.beta.genotypes(snp.betas) Is this what you are looking for?

CornelisB commented 4 years ago

Yes!!! this is awesome thanks!

samplesheet <- meffil.read.samplesheet("/path/to/folder/[read.450k.sheet] Found the following CSV files:
[1] "/path/to/folder/sheet_meffil.csv"
Warning message:
In FUN(X[[i]], ...) :
  Could not infer array name for file: /path/to/folder/METH/sheet_meffil.csv

Any idea what Im doing wrong here? I have EPIC data, so perhaps I need to specify this?

Header of sheet_meffil.csv looks like this:

Sample_Name,Sex,Slide,sentrix_col,Basename
SAMPLE1,M,203952880001,01,/path/to/folder/203952880001//203952880001_R01C01
SAMPLE2,M,203952880001,01,/path/to/folder/203952880001//203952880001_R02C01
perishky commented 4 years ago

Note that this is a warning, not an error. The function uses the 'array name' to construct sample names. However, because you have already defined samples names (see Sample_Name column), sample names don't need to be generated so there is no error. You can proceed without making any changes.

In case you are curious, the function was looking for a column named ArrayID or Sentrix_Position to provide the position of the sample on the slide, e.g. "R04C02" indicating that the sample is on row 4 column 2. The function generates a unique sample name for each sample by combining the slide identifier and this position, e.g. "203952880001_R04C02".