isglobal-brge / MEAL

Methylation and Expression AnaLysis
MIT License
1 stars 2 forks source link

correlationMethExprs error #1

Open JFsanchezherrero opened 3 years ago

JFsanchezherrero commented 3 years ago

Hi there,

I have tried MEAL for the analysis and integration of EPIC data and RNAseq but I have encountered an issue I am not able to sort it out.

I analyzed the methylation data using RnBeads and I generated beta tables with beta values for all samples and probes non filtered out. Then, using the function minfi::makeGenomicRatioSetFromMatrix and appropiate annotation (array = "IlluminaHumanMethylationEPIC", annotation = "ilm10b4.hg19") I create a GenomicRatio set.

> GRset_EPICs
class: GenomicRatioSet 
dim: 705657 25 
metadata(0):
assays(1): Beta
rownames(705657): cg14817997 cg26928153 ... cg07660283 cg09226288
rowData names(0):
colnames(25): AT102_VAT AT180_VAT ... CAT84_CF CAT86_CF
colData names(1): X1
Annotation
  array: IlluminaHumanMethylationEPIC
  annotation: ilm10b4.hg19
Preprocessing
  Method: Matrix converted with makeGenomicRatioSetFromMatrix
  minfi version: NA
  Manifest version: NA

On the other hand, I analyzed the RNAseq using DESeq2 and from the dds object generated I retrieved normalized counts and using BioMart I retrieved genome coordinates.

> expr_set
ExpressionSet (storageMode: lockedEnvironment)
assayData: 16255 features, 28 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AT102_VAT AT180_VAT ... CAT86_CF (28 total)
  varLabels: Individual Gender ... agegroup (14 total)
  varMetadata: labelDescription
featureData
  featureNames: ENSG00000000003 ENSG00000000005 ... ENSG00000278845
    (16255 total)
  fvarLabels: chromosome start end strand
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:

I create a multidataset object:

multi <- createMultiDataSet()
multi <- add_genexp(multi, expr_set)
multi <- add_methy(multi, GRset_EPICs)

and check it out:

> commonSamples(multi)
Object of class 'MultiDataSet'
 . assayData: 2 elements
    . expression: 16255 features, 25 samples 
    . methylation: 705657 features, 25 samples 
 . featureData:
    . expression: 16255 rows, 4 cols (chromosome, ..., end)
    . methylation: 705657 rows, 5 cols (seqnames, ..., width)
 . rowRanges:
    . expression: YES
    . methylation: YES
 . phenoData:
    . expression: 25 samples, 15 cols (Individual, ..., agegroup)
    . methylation: 25 samples, 2 cols (X1, id)

When I tried to do the expression methylation correlation I failed and the error produced said:

> methExprs <- correlationMethExprs(multi)
Warning messages:
1: In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
2: In correlationMethExprs(multi.filt) :
  There are no expression probes in the range of the cpgs. An empty data.frame will be returned.

I wonder where the error is or what should I do to fix it. Might it be due to the annotation of the EPIC data? I wonder if you have any experience in this type of data. All examples I have seen from MEAL use 450K microarrays but I hope it also work with this new version of microarray data.

Thanks in advance, Jose

Regards from IGTP

JFsanchezherrero commented 3 years ago

Hi there,

Talking via email with one of the authors he mentioned the possibility that annotation of probes and genes (chromosomes: chr1 vs 1) might be causing the problems between sets. He advised me to use seqlevelsStyle prior to multidataset creation and it really fixed it!

seqlevelsStyle(GRset_EPICs)
[1] "UCSC"
seqlevelsStyle(GRset_EPICs) <- 'NCBI'

So now If i create the multitadaset again and execute correlationMethExprs it works but another issue arises:

multi <- createMultiDataSet()
multi <- add_genexp(multi, expr_set)
multi <- add_methy(multi, GRset_EPICs)

methExprs <- correlationMethExprs(multi)
Computing residuals
Computing correlation Methylation-Expression
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I guess it is due to missing data in some of the expression sets and lm.fit does not tolerate it.

On the other hand, it does work If I try for a candidate region we know there is data:

> targetRange <- GRanges('chr4:151501623-151510000')
> multi.filt <- multi[, , targetRange]
> methExprs <- correlationMethExprs(multi.filt)

> targetRange <- GRanges('4:151501623-151510000')
Computing residuals
Computing correlation Methylation-Expression
> methExprs
          cpg           exprs       Beta       se   P.Value adj.P.Val
1  cg14904733 ENSG00000164142  -8.396940 24.10656 0.7309066 0.8264738
2  cg20481837 ENSG00000164142  -5.332191 15.02719 0.7260913 0.8264738
3  cg17433772 ENSG00000164142  -8.509145 19.62912 0.6686932 0.8264738
...
19 cg11363513 ENSG00000164142  -7.939236 16.19467 0.6286099 0.8264738

Ideally, I would like to do the correlation for all the dataset. I am not sure if there is any advice on any previous filtering step on the data to avoid this issue.

I have read in this thread in Stackoverflow (https://stackoverflow.com/questions/43677853/error-in-lm-fitx-y-offset-offset-singular-ok-0-non-na-cases-with-boxcox/43678700) that developers can set lm.fit to avoid or exlude NAs.

Thanks in advance