timpeters82 / DMRcate-devel

devel git for DMRcate
Other
7 stars 7 forks source link

Dealing with a data set including both 'EPICv2' or 'EPICv1' #9

Open karimi81 opened 1 month ago

karimi81 commented 1 month ago

Hi, I have a data set including samples analyzed by both 'EPICv2' or 'EPICv1' arrays (10 out of 16 samples are epicv2). While running the package I got the following error showing samples should be only from one array type?! Error in cpg.annotate("array", as.matrix(m.sig), what = "M", arraytype = arraytype, : Please specify either 'EPICv2' or 'EPICv1' for arraytype. EPICv2 probe IDs have 15 characters, e.g. cg00000029_TC21. EPICv1 probe IDs have 10 characters, e.g. cg00000029.

Do you know any solution for this error? appreciate if you share it.

timpeters82 commented 1 month ago

Hello,

DMRcate can't take a dataset with both EPICv1 and EPICv2 arrays - it has to be all one or all the other. What you can conceivably do is "pretend" your EPICv2 are actually EPICv1 samples truncate your EPICv2 dataset with rmPosReps(), strip the last 5 characters from the rownames of the EPICv2 matrix, and then concatenate to EPICv1 using the common probes. Then run cpg.annotate() with arraytype="EPICv1". However, I'd advise to run a PCA on the combined data matrix first to check for batch effects.

Also, have a look at sesame::mLiftOver() (https://github.com/zwdzwd/sesame/blob/devel/R/mLiftOver.R), this might help too.

Cheers, Tim