zwdzwd / sesame

🍪 SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Other
63 stars 33 forks source link

Problems between matching between the "probe_strand" annotation in EPICv2 and previous EPICv1 manifest #161

Open rauldiul opened 6 months ago

rauldiul commented 6 months ago

Hi,

thanks for your invaluable software and resources. I'm handling some EPICv2 data and I accessed your annotations to genes located here: https://github.com/zhou-lab/InfiniumAnnotationV1/raw/main/Anno/EPICv2/EPICv2.hg38.manifest.gencode.v41.tsv.gz

This matrix has a probe_strand variable. I wanted to compare this strand annotations with the probes previously covered by EPICv1. It seems that the labels are "opposite": most of the EPICv1 + probes are labelled as - in EPICv2, and viceversa. See this example code:

library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
annotationEPICv1 <- as.data.frame(getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19))
annotationEPICv2 <- fread(file.path(dir_methfilters,"epicv2/sesame/EPICv2.hg38.manifest.gencode.v41.tsv.gz"))
annotationEPICv2$ID2 <- str_split_fixed(annotationEPICv2$probeID,"_",2)[,1]

intersecting_probes <- intersect(annotationEPICv1$Name,annotationEPICv2$ID2)

table(annotationEPICv1[inters,]$strand != annotationEPIC$probe_strand[match(inters,annotationEPIC$ID2)])

 FALSE   TRUE 
  2236 719509 
table(annotationEPICv1[inters,]$strand, annotationEPIC$probe_strand[match(inters,annotationEPIC$ID2)])

         -      +
  -    411 358863
  + 360646   1825

Am I missing something? Do you know what could be the issue here?

Also, what is the best way to know the strand for the EPICv2 probes? I was getting it from EPICv2.hg38.manifest.gencode.v41.tsv.gz because I did not see the annotation in the manifest table EPICv2.hg38.manifest.tsv.gz

thanks a lot for the help

Raúl