zwdzwd / sesame

🍪 SEnsible Step-wise Analysis of DNA MEthylation BeadChips
Other
63 stars 33 forks source link

bisConversionControl - platform naming mismatch "EPICv2" vs "EPICplus" #141

Open bethan-mallabar-rimmer opened 10 months ago

bethan-mallabar-rimmer commented 10 months ago

Flagging a problem (which I found a workaround for) which might be caused by some inconsistent naming in the package?

First I imported some EPIC version 2 data using sesame version 1.20.0 and it automatically detected the platform as "EPICv2"

library(sesame)
sapply(c("sesame","sesameData","ExperimentHub"), function(x) as.character(packageVersion(x)))
#versions: 
#sesame    sesameData ExperimentHub 
#"1.20.0"      "1.20.0"      "2.10.0" 

## import data:
sdfs_raw <- openSesame(idat_dir, prep="", func=NULL)
#just use the first sample as an example:
sdf <- sdfs_raw[[1]]

## detect platform:
attr(sdf, "platform")
#output: [1] "EPICv2"

Then I tried to run bisConversionControl() which failed with the error:

Error in bisConversionControl(sdf) : 
  platform %in% c("EPICplus", "EPIC", "HM450") is not TRUE

Is EPICplus the same as EPICv2? If so this error seems to be caused by the fact openSesame labels EPIC version 2 data as "EPICv2" whereas bisConversionControl expects it to be labelled as "EPICplus".

The workaround I found for this involved making another copy of the data, then changing the label "EPICv2" to "EPICplus" in the copied data. This also required zwdzwd's solution from #103 in order to work:

## make a copy then change platform label
sdf_2 <- sdf
attr(sdf_2, "platform") <- "EPICplus"

## use zwdzwd's solution from #103 
mft = sesameAnno_buildManifestGRanges(
    sesameAnno_download("EPICv2.hg38.manifest.tsv.gz"),
    columns = "nextBase")
extR = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="R"]
extA = names(mft)[!is.na(mft$nextBase) & mft$nextBase=="A"]

bisConversionControl(sdf_2, extR, extA)
#output: [1] 1.087418 (i.e. function ran successfully)

However, I'm not sure if the above workaround is sound, or whether it affects accuracy of the output.

In summary: in the bisConversionControl() function, should platform %in% c("EPICplus", "EPIC", "HM450") be changed to platform %in% c("EPICv2", "EPIC", "HM450") or is there a reason for the difference in naming?