immunomethylomics / FlowSorted.Blood.EPIC

This package includes a new cell reference for adult peripheral blood deconvolution arrayed using Illumina HumanMethylationEPIC
7 stars 5 forks source link

estimateCellCounts2 error in match.arg(referencePlatform) #10

Open blaverty opened 1 year ago

blaverty commented 1 year ago

I am trying to use estimateCellCounts2 on an RGset of 450k probes, however, I am getting the error: Error in match.arg(referencePlatform): 'arg' should be “IlluminaHumanMethylationEPIC”. However, my RGset is 450k probes. Do you know why this is occurring?

RGsetTargets <- combineArrays(RGset_breast, RGset_colon, outType = "IlluminaHumanMethylation450k", verbose = TRUE)

RGsetTargets
class: RGChannelSet 
dim: 622399 2081 
metadata(0):
assays(2): Green Red
rownames(622399): 10600313 10600322 ... 74810490 74810492
rowData names(0):
colnames(2081): GSM1235534_6969568099_R02C02
  GSM1235535_6969568052_R02C01 ... GSM1052212_5730053048_R05C02
  GSM1052213_5730053048_R06C02
colData names(1): ArrayTypes
Annotation
  array: IlluminaHumanMethylation450k
  annotation: ilmn12.hg19

estimateCellCounts2(RGsetTargets, compositeCellType = "Blood", processMethod = "preprocessNoob", probeSelect = "IDOL", cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"), referencePlatform="IlluminaHumanMethylation450k", referenceset=NULL, CustomCpGs=IDOLOptimizedCpGs450klegacy)

sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /hpf/tools/centos7/R/4.2.0/lib64/R/lib/libRblas.so
LAPACK: /hpf/tools/centos7/R/4.2.0/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] IlluminaHumanMethylation450kmanifest_0.4.0
 [2] FlowSorted.Blood.EPIC_2.2.0               
 [3] ExperimentHub_2.6.0                       
 [4] AnnotationHub_3.6.0                       
 [5] BiocFileCache_2.6.0                       
 [6] dbplyr_2.2.1                              
 [7] dplyr_1.0.10                              
 [8] minfi_1.43.1                              
 [9] bumphunter_1.40.0                         
[10] locfit_1.5-9.7                            
[11] iterators_1.0.14                          
[12] foreach_1.5.2                             
[13] Biostrings_2.64.0                         
[14] XVector_0.38.0                            
[15] SummarizedExperiment_1.28.0               
[16] Biobase_2.58.0                            
[17] MatrixGenerics_1.10.0                     
[18] matrixStats_0.63.0                        
[19] GenomicRanges_1.48.0                      
[20] GenomeInfoDb_1.34.9                       
[21] IRanges_2.32.0                            
[22] S4Vectors_0.36.2                          
[23] BiocGenerics_0.44.0                       

loaded via a namespace (and not attached):
  [1] rjson_0.2.21                  ellipsis_0.3.2               
  [3] siggenes_1.72.0               mclust_6.0.0                 
  [5] base64_2.0.1                  bit64_4.0.5                  
  [7] interactiveDisplayBase_1.36.0 AnnotationDbi_1.60.0         
  [9] fansi_1.0.4                   xml2_1.3.3                   
 [11] codetools_0.2-18              splines_4.2.0                
 [13] sparseMatrixStats_1.10.0      cachem_1.0.7                 
 [15] scrime_1.3.5                  Rsamtools_2.12.0             
 [17] annotate_1.76.0               png_0.1-8                    
 [19] shiny_1.7.3                   HDF5Array_1.26.0             
 [21] BiocManager_1.30.20           readr_2.1.3                  
 [23] compiler_4.2.0                httr_1.4.5                   
 [25] assertthat_0.2.1              Matrix_1.5-1                 
 [27] fastmap_1.1.1                 limma_3.54.2                 
 [29] cli_3.6.0                     later_1.3.0                  
 [31] htmltools_0.5.3               prettyunits_1.1.1            
 [33] tools_4.2.0                   glue_1.6.2                   
 [35] GenomeInfoDbData_1.2.9        rappdirs_0.3.3               
 [37] doRNG_1.8.6                   Rcpp_1.0.10                  
 [39] vctrs_0.5.2                   rhdf5filters_1.10.0          
 [41] multtest_2.54.0               preprocessCore_1.60.2        
 [43] nlme_3.1-157                  rtracklayer_1.56.0           
 [45] DelayedMatrixStats_1.12.3     stringr_1.5.0                
 [47] mime_0.12                     lifecycle_1.0.3              
 [49] restfulr_0.0.15               rngtools_1.5.2               
 [51] XML_3.99-0.13                 beanplot_1.3.1               
 [53] zlibbioc_1.44.0               MASS_7.3-57                  
 [55] promises_1.2.0.1              hms_1.1.2                    
 [57] rhdf5_2.42.0                  GEOquery_2.66.0              
 [59] RColorBrewer_1.1-3            yaml_2.3.7                   
 [61] curl_5.0.0                    memoise_2.0.1                
 [63] biomaRt_2.54.0                reshape_0.8.9                
 [65] stringi_1.7.12                RSQLite_2.3.0                
 [67] BiocVersion_3.16.0            genefilter_1.80.3            
 [69] BiocIO_1.8.0                  GenomicFeatures_1.50.2       
 [71] filelock_1.0.2                BiocParallel_1.33.9          
 [73] rlang_1.0.6                   pkgconfig_2.0.3              
 [75] bitops_1.0-7                  nor1mix_1.3-0                
 [77] lattice_0.20-45               purrr_1.0.1                  
 [79] Rhdf5lib_1.20.0               GenomicAlignments_1.34.0     
 [81] bit_4.0.5                     tidyselect_1.2.0             
 [83] plyr_1.8.8                    magrittr_2.0.3               
 [85] R6_2.5.1                      generics_0.1.3               
 [87] DelayedArray_0.22.0           DBI_1.1.3                    
 [89] pillar_1.8.1                  survival_3.4-0               
 [91] KEGGREST_1.38.0               RCurl_1.98-1.10              
 [93] tibble_3.1.8                  crayon_1.5.2                 
 [95] utf8_1.2.3                    tzdb_0.3.0                   
 [97] progress_1.2.2                grid_4.2.0                   
 [99] data.table_1.14.8             blob_1.2.3                   
[101] digest_0.6.31                 xtable_1.8-4                 
[103] httpuv_1.6.6                  tidyr_1.2.1                  
[105] illuminaio_0.40.0             openssl_2.0.5                
[107] askpass_1.1                   quadprog_1.5-8   
lucassalas commented 1 year ago

Hi Brianne,

These are options that we now usually leave under the hood. The reason here is that the program is asking for the reference platform (i.e, FlowSorted.Blood.EPIC (our reference) vs FlowSorted.Blood.450k (the original Reinius et al.)) instead of the platform of your data.

If you are using the latest version of FlowSorted.Blood.EPIC (>=v.2.0) you do not need to specify any of those options.

See the following example:

library(FlowSorted.Blood.450k) RGsetTargets2 <- FlowSorted.Blood.450k[, FlowSorted.Blood.450k$CellType == "WBC" ] sampleNames(RGsetTargets2) <- paste(RGsetTargets2$CellType, seqlen(dim(RGsetTargets2)[2]), sep = "" ) RGsetTargets2 propEPIC2 <- estimateCellCounts2(RGsetTargets2, compositeCellType = "Blood", processMethod = "preprocessNoob", probeSelect = "IDOL", cellTypes = c( "CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu" ) head(propEPIC2$prop) percEPIC2 <- round(propEPIC2$prop * 100, 1)

Here you processed a 450k data using IDOL, the program will deal with these details under the hood.

I hope that helps.

Best,

Lucas

From: Brianne Laverty @.> Sent: Monday, March 6, 2023 4:29 PM To: immunomethylomics/FlowSorted.Blood.EPIC @.> Cc: Subscribed @.***> Subject: [immunomethylomics/FlowSorted.Blood.EPIC] estimateCellCounts2 error in match.arg(referencePlatform) (Issue #10)

I am trying to use estimateCellCounts2 on an RGset of 450k probes, however, I am getting the error: Error in match.arg(referencePlatform): 'arg' should be "IlluminaHumanMethylationEPIC". However, my RGset is 450k probes. Do you know why this is occurring?

` RGsetTargets <- combineArrays(RGset_breast, RGset_colon, outType = "IlluminaHumanMethylation450k", verbose = TRUE)

RGsetTargets class: RGChannelSet dim: 622399 2081 metadata(0): assays(2): Green Red rownames(622399): 10600313 10600322 ... 74810490 74810492 rowData names(0): colnames(2081): GSM1235534_6969568099_R02C02 GSM1235535_6969568052_R02C01 ... GSM1052212_5730053048_R05C02 GSM1052213_5730053048_R06C02 colData names(1): ArrayTypes Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19

estimateCellCounts2(RGsetTargets, compositeCellType = "Blood", processMethod = "preprocessNoob", probeSelect = "IDOL", cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"), referencePlatform="IlluminaHumanMethylation450k", referenceset=NULL, CustomCpGs=IDOLOptimizedCpGs450klegacy)

sessionInfo() `

- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fimmunomethylomics%2FFlowSorted.Blood.EPIC%2Fissues%2F10&data=05%7C01%7CLucas.A.Salas%40dartmouth.edu%7C73f97448af0b49afd26e08db1e89c310%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C638137349250234355%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GRec%2FbEsMPEqxAYXIhaMi4b%2F0RKWsXCdrg8q4vAYipQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFPMGOSR2UOUYHH26QZ2SJ3W2ZJIXANCNFSM6AAAAAAVRUYJUA&data=05%7C01%7CLucas.A.Salas%40dartmouth.edu%7C73f97448af0b49afd26e08db1e89c310%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C638137349250234355%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=35TFj6EyKvxSam0RMVxeMQlHOwxueLbwgxQcBRlv%2Brc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

cdelacalle commented 1 year ago

Dear Lucas,

I am facing a similar problem.

I am not able to estimate the cell types with an RGset from IlluminaEPICv2 version.

rgset@annotation

array annotation "IlluminaHumanMethylationEPICv2" "20a1.hg38"

estimatecellsEPIC <- FlowSorted.Blood.EPIC::estimateCellCounts2(rgset)

snapshotDate(): 2023-04-24 snapshotDate(): 2023-04-24 see ?FlowSorted.Blood.EPIC and browseVignettes('FlowSorted.Blood.EPIC') for documentation loading from cache [convertArray] Casting as IlluminaHumanMethylationEPIC Error in .convertArray_450k_epic(rgSet = object, outType = outType, verbose = verbose) : .is450k(rgSet) || .isEPIC(rgSet) is not TRUE`

I have tried to force-convert the annotation version like this (not sure this is correct):

rgset@annotation <- c(array = "IlluminaHumanMethylationEPIC", annotation = 'ilm10b4.hg19')

However the results are the same for all samples (although they should proceed from highly dissimilar original cell types. So I do not really trust the results are true:

estimatecells450k[["prop"]]

CD8T CD4T NK Bcell Mono Neu 207107860100_R01C01 0.0283 0.2932 0 0.1244 0.0644 0.2112 207107860100_R02C01 0.0317 0.2899 0 0.1242 0.0608 0.2138 207107860100_R03C01 0.0308 0.2907 0 0.1243 0.0610 0.2140 207107860100_R04C01 0.0304 0.2908 0 0.1244 0.0611 0.2142`

Is there any way to circumvent this issue?

Thanks a lot in advance!

Best Carlos

lucassalas commented 1 year ago

Hi @cdelacalle,

I will provide some code I have used to circumvent these problems. These are not official; I will ask some students to review the code and see whether I can provide a longer-term solution.

library(devtools)
install_github("mwsill/IlluminaHumanMethylationEPICv2manifest") 
 install_github("mwsill/minfi")#This is not the official minfi BE CAREFUL!!!#

 library(minfi)
wddir<-"//yourdatapath/ "
sheet<-read.metharray.sheet(wddir, pattern = "yourmanifest.csv")
RGset <- read.metharray.exp(targets = sheet, extended = TRUE)
Mset <- preprocessIllumina(RGset)#Noob is not working with this annotation, or my own annotation.
Mset
library(FlowSorted.Blood.EPIC)
  IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(getBeta(RGset)))]
  identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2)
   propEPIC <- projectCellType_CP(
        getBeta(Mset)[IDOLOptimizedCpGsBloodv2, ],
        IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,],
        contrastWBC = NULL, nonnegative = TRUE,
        lessThanOne = FALSE
    )

I've also heard of this solution, but I have not tried it myself. Ideally you should use Noob.

Let me know if that works for you.

cdelacalle commented 1 year ago

Thanks a lot @lucassalas for your extremely fast reply.

The code you provided solved the problem right away. Results now make much more sense!

The only detail was that IDOLOptimizedCpGsBlood object was not found, and instead, something was found by the name of IDOLOptimizedCpGs. The description was the same so I assumed it was that one instead.

Thank you again. Best Carlos

cdelacalle commented 1 year ago

On a side note:

I see that the .comptable is not available for the cord blood dataset. Is there any way to derive it from the data.frame (IDOLOptimizedCpGsCordBlood) ?

And also: is there any plan to include the brain DNAm reference included in previous versions of estimatecellcounts?

Thanks a lot Best Carlos

lucassalas commented 1 year ago

Quick answers: If you are using FlowSorted.CordBloodCombined.450k

library(FlowSorted.CordBloodCombined.450k)
FlowSorted.CordBloodCombined.450k.compTable

If you want to use the Guintivano et al., you can use the method on library(FlowSorted.DLPFC.450k). However, we published HiBED recently. You should explore that alternative instead that is much more comprehensive. The paper is here.

Good luck.

lucassalas commented 10 months ago

We will add this solution to the package soon. In the meantime please use the following

devtools::install_github("jokergoo/IlluminaHumanMethylationEPICv2manifest") 
devtools::install_github("jokergoo/IlluminaHumanMethylationEPICv2anno.20a1.hg38")

library(minfi)
library(sesame)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(IlluminaHumanMethylationEPICv2anno.20a1.hg38)
library(IlluminaHumanMethylationEPICv2manifest)

##############################
# Load RGset
RGset = read.metharray.exp(workdir,recursive = TRUE)

annotation(RGset)["array"] = "IlluminaHumanMethylationEPICv2"
annotation(RGset)["annotation"] = "20a1.hg38"

MSet <-preprocessNoob(RGset)

Betas<-getBeta(MSet)
Betas<- sesame::betasCollapseToPfx(Betas) #you can also use ENmix::rm.cgsuffix(Betas) or other function to remove replicates 

library(FlowSorted.Blood.EPIC)
IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))]
identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2)
propEPIC <- projectCellType_CP(
    Betas[IDOLOptimizedCpGsBloodv2, ],
    IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,],
    contrastWBC = NULL, nonnegative = TRUE,
    lessThanOne = FALSE
)
hguigui123 commented 8 months ago

Hello,

I am also trying to use estimateCellCounts2 with methylation data from an EPICV2 array. I've tried the code you provided but I still have issues : when I am trying to use it the exact error message is "Error in getBeta(MSet)[IDOLOptimizedCpGsBloodv2, ] : subscript out of bounds" . I am not sure to understand what is the exact problem with it now.

Thanks in advance if you have any advice for this.

lucassalas commented 8 months ago

Hi @hguigui123 m

If you see the code above, you should not use the getBeta(MSet). The EPICV2 has a different set of names due to the presence of technical replicates (cgXXXXXX_TCXX or cgXXXXXX_BCXX) you need to collapse the matrix to a single value using sesame (sesame::betasCollapseToPfx) or ENMix (ENmix::rm.cgsuffix) functions and then you can use those values to project the cell types.

I hope that answers your question.

Good luck.

lucassalas commented 8 months ago

Please check that this is TRUE in the code above: IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))] identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2) You should have two matrices with the same dimensions. This is at the beta value level as you need the product to be a positive definite to progress in the calculation. The "Betas" object is after collapsing the cg names.

hguigui123 commented 8 months ago

Hello,

Indeed it works perfectly now.

Thanks again for your very precious help !

annekristin commented 7 months ago

Hi, I have seen in the documentation how I can get download the reference data set using the function libraryDataGet(title) to do the analysis offline. This works for "FlowSorted.Blood.EPIC", but what is the name of the dataset for 450K? I have tried "FlowSorted.Blood450k", but I get an error message: Error in .local(x, i, j = j, ...) : 'i' must be length 1

Thank you for any help

lucassalas commented 7 months ago

Hi @annekristin, Could you please elaborate what is the purpose of using the 450k data? If you are trying to use FlowSorted.Blood.EPIC for deconvolving a 450K legacy library you do not need that library (and we DO NOT recommend that for our libraries). If you want to use the library, for other purposes, the data derived from Reinius et al publication can be downloaded using an independent library package from Bioconductor (not hosted on ExperimentHub). I hope that helps.

annekristin commented 7 months ago

Thank you for your reply, @lucassalas. I want to perform a meta-analysis between datasets from 450k, epicv1 and epicv2. As part of this I need to estimate the cell counts for each platform. I was hoping to use the IDOL probes for all platforms (trying to keep the pipelines for each platform as similar as possible). If I understand you correctly I should use the FlowSorted.Blood.EPIC dataset as the reference set for all 3 platforms.

According to the documentation for estimateCellCounts2 I can set the reference platform to 450k, but it will only accept epic. I see someone else have had a similar issue, and that you replied that you deal with this under the hood. Does that mean that even when I leave the reference platform to the default "IlluminaHumanMethylationEPIC", you extract the array type from the RGset and set the appropriate platform under the hood, and that if I select IDOL probes and you see that my RGset is from 450k, you will choose the 450k legacy probes? Just trying to understand how it works :-)

Is there a benefit to using the IDOL legacy probes with 450k, or am I just as well off using estimateCellCounts from minfi for this platform?

Thank you for your help

lucassalas commented 7 months ago

Hi @annekristin,

Your statement is correct. The reference platform refers to the reference that you will use for the process. In this case, you keep the EPIC. In our 2018 paper, we generated an EPIC and a 450k legacy library that the algorithm will select automatically (using the EPIC reference) depending on which platform corresponds to your samples (this is the under-the-hood statement). The benefit, as mentioned in the 2018 paper, is in terms of precision for several cell types. I would recommend that you use the IDOL libraries only. You do not need to modify these parameters and use the default. The EPICV2 is slightly more complicated as we are not allowed to use GitHub libraries in Bioconductor packages, and there are no Bioconductor options for the libraries that I mentioned on Jan 9 that I can incorporate into the official package. Please use that approach for now.

annekristin commented 7 months ago

Thank you so much @lucassalas !

pcibin commented 4 months ago

Dear Lucas,

I am analyzing some PBMCs data from the Epicv2. I used the approach you mentioned in Jan 09th and it worked perfectly! However, it also includes Neutrophils, which of course we don't have in our PBMCs dataset... My colleague who had data from the Epicv1 used estimateCellCounts2 with the argument cellTypes = c("CD8T","CD4T", "NK","Bcell","Mono")... Is there a way to do something similar with the projectCellType_CP?

Thank you in advance

lucassalas commented 4 months ago

Hi @pcibin, Yes, it is possible you can select the columns on the reference library(FlowSorted.Blood.EPIC) IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))] identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2) propEPIC <- projectCellType_CP( Betas[IDOLOptimizedCpGsBloodv2, ], IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,c("CD8T","CD4T", "NK","Bcell","Mono")], contrastWBC = NULL, nonnegative = TRUE, lessThanOne = FALSE )

However, from the biological standpoint, I would leave the Neutrophils as is; even with optimal Ficoll, sometimes PMNs can cross or degranulate. Please double check that the columns are correct in your reference matrix.