immunomethylomics / FlowSorted.Blood.EPIC

This package includes a new cell reference for adult peripheral blood deconvolution arrayed using Illumina HumanMethylationEPIC
7 stars 5 forks source link

Error while running estimateCellCounts2 for cord blood samples (EPIC) #4

Closed samulituominen closed 3 years ago

samulituominen commented 4 years ago

I'm getting a weird error while running FlowSorted.Blood.EPIC::estimateCellCounts2(). My data are from cord blood measured with the EPIC chip. This is my setup:

library(FlowSorted.Blood.EPIC)
library(FlowSorted.CordBloodCombined.450k) 

data(IDOLOptimizedCpGsCordBlood)

RGSet <- readRDS('rgset.rds')

RGSet

class: RGChannelSet dim: 1052641 288 metadata(0): assays(2): Green Red rownames(1052641): 1600101 1600111 ... 99810990 99810992 rowData names(0): colnames(288): ... (removed these) colData names(12): X ID ... project filenames Annotation array: IlluminaHumanMethylationEPIC annotation: ilm10b4.hg19

Then, I try to run:

countsEPIC <- estimateCellCounts2(RGSet, 
                                  compositeCellType = "CordBloodCombined", 
                                  processMethod = "preprocessNoob",
                                  probeSelect = "IDOL", 
                                  cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu", "nRBC"), 
                                  referencePlatform = "IlluminaHumanMethylation450k",
                                  referenceset = "FlowSorted.CordBloodCombined.450k",
                                  IDOLOptimizedCpGs = IDOLOptimizedCpGsCordBlood, 
                                  returnAll = FALSE)

which I hope is appropriate given the data. However, I'm getting this error:

[convertArray] Casting as IlluminaHumanMethylation450k Error in h(simpleError(msg, call)) : error in evaluating the argument 'table' in selecting a method for function '%in%': unable to find an inherited method for function ‘colData’ for signature ‘"function"’

Do you have any idea what is causing this ? I also noticed that if I try the same, but with different (inappropriate) arguments, it finishes:

countsEPIC <- FlowSorted.Blood.EPIC::estimateCellCounts2(
  RGSet,
  compositeCellType = "Blood",
  cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"),
  processMethod = "preprocessNoob",
  probeSelect = "IDOL",
  referencePlatform = "IlluminaHumanMethylationEPIC",
  IDOLOptimizedCpGs = IDOLOptimizedCpGs,
  returnAll = FALSE)

[estimateCellCounts2] Combining user data with reference (flow sorted) data. [estimateCellCounts2] Processing user and reference data together. [estimateCellCounts2] Using IDOL L-DMR probes for composition estimation. [estimateCellCounts2] Estimating composition.

Warning message: In asMethod(object) : NAs introduced by coercion

Below is my sessionInfo(). Im using R 4.0.2 with all the packages freshly installed.

R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats4 parallel stats graphics grDevices datasets utils methods base

other attached packages: [1] IlluminaHumanMethylationEPICmanifest_0.3.0
[2] IlluminaHumanMethylation450kmanifest_0.4.0
[3] FlowSorted.CordBloodCombined.450k_1.4.1
[4] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0 [5] FlowSorted.Blood.EPIC_1.6.1
[6] ExperimentHub_1.14.2
[7] AnnotationHub_2.20.2
[8] BiocFileCache_1.12.1
[9] dbplyr_1.4.4
[10] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0 [11] nlme_3.1-149
[12] quadprog_1.5-8
[13] genefilter_1.70.0
[14] minfi_1.34.0
[15] bumphunter_1.30.0
[16] locfit_1.5-9.4
[17] iterators_1.0.12
[18] foreach_1.5.0
[19] Biostrings_2.56.0
[20] XVector_0.28.0
[21] SummarizedExperiment_1.18.2
[22] DelayedArray_0.14.1
[23] matrixStats_0.56.0
[24] Biobase_2.48.0
[25] GenomicRanges_1.40.0
[26] GenomeInfoDb_1.24.2
[27] IRanges_2.22.2
[28] S4Vectors_0.26.1
[29] BiocGenerics_0.34.0

loaded via a namespace (and not attached): [1] ellipsis_0.3.1 siggenes_1.62.0 mclust_5.4.6
[4] base64_2.0 bit64_4.0.5 interactiveDisplayBase_1.26.3 [7] AnnotationDbi_1.50.3 xml2_1.3.2 codetools_0.2-16
[10] splines_4.0.2 scrime_1.3.5 Rsamtools_2.4.0
[13] annotate_1.66.0 shiny_1.5.0 HDF5Array_1.16.1
[16] BiocManager_1.30.10 readr_1.3.1 compiler_4.0.2
[19] httr_1.4.2 assertthat_0.2.1 Matrix_1.2-18
[22] fastmap_1.0.1 limma_3.44.3 later_1.1.0.1
[25] htmltools_0.5.0 prettyunits_1.1.1 tools_4.0.2
[28] glue_1.4.2 GenomeInfoDbData_1.2.3 dplyr_1.0.2
[31] rappdirs_0.3.1 doRNG_1.8.2 Rcpp_1.0.5
[34] vctrs_0.3.4 multtest_2.44.0 preprocessCore_1.50.0
[37] rtracklayer_1.48.0 DelayedMatrixStats_1.10.1 stringr_1.4.0
[40] mime_0.9 lifecycle_0.2.0 renv_0.12.0
[43] rngtools_1.5 XML_3.99-0.5 beanplot_1.2
[46] zlibbioc_1.34.0 MASS_7.3-53 hms_0.5.3
[49] promises_1.1.1 rhdf5_2.32.2 GEOquery_2.56.0
[52] RColorBrewer_1.1-2 yaml_2.2.1 curl_4.3
[55] memoise_1.1.0 biomaRt_2.44.1 reshape_0.8.8
[58] stringi_1.5.3 RSQLite_2.2.0 BiocVersion_3.11.1
[61] GenomicFeatures_1.40.1 BiocParallel_1.22.0 rlang_0.4.7
[64] pkgconfig_2.0.3 bitops_1.0-6 nor1mix_1.3-0
[67] lattice_0.20-41 purrr_0.3.4 Rhdf5lib_1.10.1
[70] GenomicAlignments_1.24.0 bit_4.0.4 tidyselect_1.1.0
[73] plyr_1.8.6 magrittr_1.5 R6_2.4.1
[76] generics_0.0.2 DBI_1.1.0 pillar_1.4.6
[79] survival_3.2-3 RCurl_1.98-1.2 tibble_3.0.3
[82] crayon_1.3.4 progress_1.2.2 grid_4.0.2
[85] data.table_1.13.0 blob_1.2.1 digest_0.6.25
[88] xtable_1.8-4 tidyr_1.1.2 httpuv_1.5.4
[91] illuminaio_0.30.0 openssl_1.4.2 askpass_1.1

lucassalas commented 4 years ago

Hi @samulituominen ,

The error stems from a difference in the cells that you are estimating. In your code you must replace "Neu" for "Gran" for CordBloodCombined. See pages 6 and 7 from the manual . I hope this solves the issue. I will close this issue, but please let me know whether you still have any problems after that.

samulituominen commented 4 years ago

Hi @lucassalas and thanks for the quick response ! I did that and unfortunately the same error persists:

library(FlowSorted.Blood.EPIC)
library(FlowSorted.CordBloodCombined.450k) 

data(IDOLOptimizedCpGsCordBlood)

RGSet <- readRDS('rgset.rds')

countsEPIC <- estimateCellCounts2(RGSet, 
                                  compositeCellType = "CordBloodCombined", 
                                  processMethod = "preprocessNoob",
                                  probeSelect = "IDOL", 
                                  cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Gran", "nRBC"), 
                                  referencePlatform = "IlluminaHumanMethylation450k",
                                  referenceset = "FlowSorted.CordBloodCombined.450k",
                                  IDOLOptimizedCpGs = IDOLOptimizedCpGsCordBlood, 
                                  returnAll = FALSE)

[convertArray] Casting as IlluminaHumanMethylation450k Loading required package: IlluminaHumanMethylation450kmanifest Loading required package: IlluminaHumanMethylationEPICmanifest Error in h(simpleError(msg, call)) : error in evaluating the argument 'table' in selecting a method for function '%in%': unable to find an inherited method for function ‘colData’ for signature ‘"function"’

lucassalas commented 4 years ago

Hi,

I want to diagnose the problem as I do not see the line where the function is broken.

My best guess is that one of your colData is either logical or factor and this is breaking the combination by the combineArrays function from minfi.

Please take a look using this: head(pData(RGset))

If any logical or factor, manually set those to as.character.

Also run this two lines to confirm that colData is being called properly:

`!"CellType" %in% names(colData(FlowSorted.CordBloodCombined.450k))

FALSE

!"CellType" %in% names(colData(RGSet))

TRUE

`

What is strange here is that the minfi function combineArrays does not call a match table for RGChannelSet objects. Was this object stored with an old minfi version?

The functions are just these lines:

`setMethod( "combineArrays", signature(object1 = "RGChannelSet", object2 = "RGChannelSet"), function(object1, object2, outType = c("IlluminaHumanMethylation450k", "IlluminaHumanMethylationEPIC"), verbose = TRUE) {

    outType <- match.arg(outType)
    array1 <- annotation(object1)[["array"]]
    array2 <- annotation(object2)[["array"]]
    if (array1 == array2) outType <- array1
    if (array1 == "IlluminaHumanMethylation27k" ||
        array2 == "IlluminaHumanMethylation27k") {
        stop("27k arrays cannot be combined at the RGChannelSet level.")
    }
    object1 <- convertArray(object1, outType = outType, verbose = verbose)
    object2 <- convertArray(object2, outType = outType, verbose = verbose)
    features1 <- rownames(object1)
    features2 <- rownames(object2)
    features  <- intersect(features1, features2)
    object1  <- object1[features, ]
    object2  <- object2[features, ]
    rgSet <- combine(object1, object2)
    rgSet$ArrayTypes <- rep(
        x = c(array1, array2),
        times = c(ncol(object1), ncol(object2)))
    rgSet
}

) setMethod( "convertArray", signature(object = "RGChannelSet"), function(object, outType = c("IlluminaHumanMethylation450k", "IlluminaHumanMethylationEPIC"), verbose = TRUE) {

    outType <- match.arg(outType)
    array <- annotation(object)[["array"]]
    if (array == outType) return(object)
    if (verbose) message(sprintf("[convertArray] Casting as %s", outType))
    .convertArray_450k_epic(
        rgSet = object,
        outType = outType,
        verbose = verbose)
}

) .convertArray_450k_epic<-function (rgSet, outType = c("IlluminaHumanMethylation450k", "IlluminaHumanMethylationEPIC"), verbose = verbose) { outType <- match.arg(outType) .isRGOrStop(rgSet) stopifnot(.is450k(rgSet) || .isEPIC(rgSet)) array <- annotation(rgSet)["array"] if (array == outType) stop("'rgSet' already in the 'outType' array type.") manifest1 <- getManifest(outType) manifest2 <- getManifest(rgSet) keepAddresses <- list(I = NULL, II = NULL, SnpI = NULL, SnpII = NULL, Control = NULL) probes1 <- getProbeInfo(manifest1, type = "I") probes2 <- getProbeInfo(manifest2, type = "I") commonNames <- intersect(probes1$Name, probes2$Name) probes1 <- probes1[match(commonNames, probes1$Name), ] probes2 <- probes2[match(commonNames, probes2$Name), ] stopifnot(all(probes1$Color == probes2$Color)) stopifnot(all(probes1$ProbeSeqA == probes2$ProbeSeqA)) stopifnot(all(probes1$ProbeSeqB == probes2$ProbeSeqB)) translate <- c(probes1$AddressA, probes1$AddressB) names(translate) <- c(probes2$AddressA, probes2$AddressB) wh <- which(rownames(rgSet) %in% names(translate)) rownames(rgSet)[wh] <- translate[rownames(rgSet)[wh]] keepAddresses$I <- unname(translate) probes1 <- getProbeInfo(manifest1, type = "II") probes2 <- getProbeInfo(manifest2, type = "II") commonNames <- intersect(probes1$Name, probes2$Name) probes1 <- probes1[match(commonNames, probes1$Name), ] probes2 <- probes2[match(commonNames, probes2$Name), ] stopifnot(all(probes1$ProbeSeqA == probes2$ProbeSeqA)) translate <- probes1$AddressA names(translate) <- probes2$AddressA wh <- which(rownames(rgSet) %in% names(translate)) rownames(rgSet)[wh] <- translate[rownames(rgSet)[wh]] keepAddresses$II <- unname(translate) probes1 <- getProbeInfo(manifest1, type = "SnpI") probes2 <- getProbeInfo(manifest2, type = "SnpI") commonNames <- intersect(probes1$Name, probes2$Name) probes1 <- probes1[match(commonNames, probes1$Name), ] probes2 <- probes2[match(commonNames, probes2$Name), ] stopifnot(all(probes1$ProbeSeqA == probes2$ProbeSeqB)) stopifnot(all(probes1$ProbeSeqB == probes2$ProbeSeqA)) translate <- c(probes1$AddressA, probes1$AddressB) names(translate) <- c(probes2$AddressA, probes2$AddressB) wh <- which(rownames(rgSet) %in% names(translate)) rownames(rgSet)[wh] <- translate[rownames(rgSet)[wh]] keepAddresses$SnpI <- unname(translate) probes1 <- getProbeInfo(manifest1, type = "SnpII") probes2 <- getProbeInfo(manifest2, type = "SnpII") commonNames <- intersect(probes1$Name, probes2$Name) probes1 <- probes1[match(commonNames, probes1$Name), ] probes2 <- probes2[match(commonNames, probes2$Name), ] stopifnot(all(probes1$ProbeSeqA == probes2$ProbeSeqA)) translate <- probes1$AddressA names(translate) <- probes2$AddressA wh <- which(rownames(rgSet) %in% names(translate)) rownames(rgSet)[wh] <- translate[rownames(rgSet)[wh]] keepAddresses$SnpII <- unname(translate) probes1 <- getProbeInfo(manifest1, type = "Control") probes2 <- getProbeInfo(manifest2, type = "Control") commonAddress <- intersect(probes1$Address, probes2$Address) probes1 <- probes1[match(commonAddress, probes1$Address), ] probes2 <- probes2[match(commonAddress, probes2$Address), ] keepAddresses$Control <- unname(probes1$Address) keepAddresses <- do.call("c", keepAddresses) keepAddresses <- keepAddresses[keepAddresses %in% rownames(rgSet)] rgSet <- rgSet[keepAddresses, ] annotation(rgSet) <- .getAnnotationFromOutType(outType) rgSet }`

table is not called here (but it is for other combineArrays minfi objects).

Look carefully to your colData, if nothing of the solutions above work, I would try to redo the RGChannelSet (maybe a small subset of 10 samples instead of the 288) and check again the combineArrays. Please let me know if it works.

samulituominen commented 3 years ago

I ran the script line by line and noticed that this line throws the error. From that and preceding lines I understood that I was missing the reference set (FlowSorted.CordBloodCombined.450k) from my workspace.

The error message was a bit cryptic, but in the end it was my bad. Downloading the reference set fixed the problem, here's the complete working version:

library(FlowSorted.Blood.EPIC)
library(FlowSorted.CordBloodCombined.450k) 

data(IDOLOptimizedCpGsCordBlood)

library(ExperimentHub)
hub <- ExperimentHub()
myfiles <- query(hub, "FlowSorted.CordBloodCombined.450k")
FlowSorted.CordBloodCombined.450k <- myfiles[[1]]

RGSet <- readRDS('rgset.rds')

memory.limit(60000)

countsEPIC <- estimateCellCounts2(RGSet, 
                                  compositeCellType = "CordBloodCombined", 
                                  processMethod = "preprocessNoob",
                                  probeSelect = "IDOL", 
                                  cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Gran", "nRBC"), 
                                  referencePlatform = "IlluminaHumanMethylation450k",
                                  referenceset = "FlowSorted.CordBloodCombined.450k",
                                  IDOLOptimizedCpGs = IDOLOptimizedCpGsCordBlood, 
                                  returnAll = FALSE)

[convertArray] Casting as IlluminaHumanMethylation450k [estimateCellCounts2] Combining user data with reference (flow sorted) data. [estimateCellCounts2] Processing user and reference data together. [estimateCellCounts2] Using IDOL L-DMR probes for composition estimation. [estimateCellCounts2] Estimating composition.

Warning message: In asMethod(object) : NAs introduced by coercion