Open blaverty opened 1 year ago
Hi Brianne,
These are options that we now usually leave under the hood. The reason here is that the program is asking for the reference platform (i.e, FlowSorted.Blood.EPIC (our reference) vs FlowSorted.Blood.450k (the original Reinius et al.)) instead of the platform of your data.
If you are using the latest version of FlowSorted.Blood.EPIC (>=v.2.0) you do not need to specify any of those options.
See the following example:
library(FlowSorted.Blood.450k) RGsetTargets2 <- FlowSorted.Blood.450k[, FlowSorted.Blood.450k$CellType == "WBC" ] sampleNames(RGsetTargets2) <- paste(RGsetTargets2$CellType, seqlen(dim(RGsetTargets2)[2]), sep = "" ) RGsetTargets2 propEPIC2 <- estimateCellCounts2(RGsetTargets2, compositeCellType = "Blood", processMethod = "preprocessNoob", probeSelect = "IDOL", cellTypes = c( "CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu" ) head(propEPIC2$prop) percEPIC2 <- round(propEPIC2$prop * 100, 1)
Here you processed a 450k data using IDOL, the program will deal with these details under the hood.
I hope that helps.
Best,
Lucas
From: Brianne Laverty @.> Sent: Monday, March 6, 2023 4:29 PM To: immunomethylomics/FlowSorted.Blood.EPIC @.> Cc: Subscribed @.***> Subject: [immunomethylomics/FlowSorted.Blood.EPIC] estimateCellCounts2 error in match.arg(referencePlatform) (Issue #10)
I am trying to use estimateCellCounts2 on an RGset of 450k probes, however, I am getting the error: Error in match.arg(referencePlatform): 'arg' should be "IlluminaHumanMethylationEPIC". However, my RGset is 450k probes. Do you know why this is occurring?
` RGsetTargets <- combineArrays(RGset_breast, RGset_colon, outType = "IlluminaHumanMethylation450k", verbose = TRUE)
RGsetTargets class: RGChannelSet dim: 622399 2081 metadata(0): assays(2): Green Red rownames(622399): 10600313 10600322 ... 74810490 74810492 rowData names(0): colnames(2081): GSM1235534_6969568099_R02C02 GSM1235535_6969568052_R02C01 ... GSM1052212_5730053048_R05C02 GSM1052213_5730053048_R06C02 colData names(1): ArrayTypes Annotation array: IlluminaHumanMethylation450k annotation: ilmn12.hg19
estimateCellCounts2(RGsetTargets, compositeCellType = "Blood", processMethod = "preprocessNoob", probeSelect = "IDOL", cellTypes = c("CD8T", "CD4T", "NK", "Bcell", "Mono", "Neu"), referencePlatform="IlluminaHumanMethylation450k", referenceset=NULL, CustomCpGs=IDOLOptimizedCpGs450klegacy)
sessionInfo() `
- Reply to this email directly, view it on GitHubhttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fimmunomethylomics%2FFlowSorted.Blood.EPIC%2Fissues%2F10&data=05%7C01%7CLucas.A.Salas%40dartmouth.edu%7C73f97448af0b49afd26e08db1e89c310%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C638137349250234355%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GRec%2FbEsMPEqxAYXIhaMi4b%2F0RKWsXCdrg8q4vAYipQ%3D&reserved=0, or unsubscribehttps://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAFPMGOSR2UOUYHH26QZ2SJ3W2ZJIXANCNFSM6AAAAAAVRUYJUA&data=05%7C01%7CLucas.A.Salas%40dartmouth.edu%7C73f97448af0b49afd26e08db1e89c310%7C995b093648d640e5a31ebf689ec9446f%7C0%7C0%7C638137349250234355%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=35TFj6EyKvxSam0RMVxeMQlHOwxueLbwgxQcBRlv%2Brc%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>
Dear Lucas,
I am facing a similar problem.
I am not able to estimate the cell types with an RGset from IlluminaEPICv2 version.
rgset@annotation
array annotation "IlluminaHumanMethylationEPICv2" "20a1.hg38"
estimatecellsEPIC <- FlowSorted.Blood.EPIC::estimateCellCounts2(rgset)
snapshotDate(): 2023-04-24 snapshotDate(): 2023-04-24 see ?FlowSorted.Blood.EPIC and browseVignettes('FlowSorted.Blood.EPIC') for documentation loading from cache [convertArray] Casting as IlluminaHumanMethylationEPIC Error in .convertArray_450k_epic(rgSet = object, outType = outType, verbose = verbose) : .is450k(rgSet) || .isEPIC(rgSet) is not TRUE`
I have tried to force-convert the annotation version like this (not sure this is correct):
rgset@annotation <- c(array = "IlluminaHumanMethylationEPIC", annotation = 'ilm10b4.hg19')
However the results are the same for all samples (although they should proceed from highly dissimilar original cell types. So I do not really trust the results are true:
estimatecells450k[["prop"]]
CD8T CD4T NK Bcell Mono Neu 207107860100_R01C01 0.0283 0.2932 0 0.1244 0.0644 0.2112 207107860100_R02C01 0.0317 0.2899 0 0.1242 0.0608 0.2138 207107860100_R03C01 0.0308 0.2907 0 0.1243 0.0610 0.2140 207107860100_R04C01 0.0304 0.2908 0 0.1244 0.0611 0.2142`
Is there any way to circumvent this issue?
Thanks a lot in advance!
Best Carlos
Hi @cdelacalle,
I will provide some code I have used to circumvent these problems. These are not official; I will ask some students to review the code and see whether I can provide a longer-term solution.
library(devtools)
install_github("mwsill/IlluminaHumanMethylationEPICv2manifest")
install_github("mwsill/minfi")#This is not the official minfi BE CAREFUL!!!#
library(minfi)
wddir<-"//yourdatapath/ "
sheet<-read.metharray.sheet(wddir, pattern = "yourmanifest.csv")
RGset <- read.metharray.exp(targets = sheet, extended = TRUE)
Mset <- preprocessIllumina(RGset)#Noob is not working with this annotation, or my own annotation.
Mset
library(FlowSorted.Blood.EPIC)
IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(getBeta(RGset)))]
identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2)
propEPIC <- projectCellType_CP(
getBeta(Mset)[IDOLOptimizedCpGsBloodv2, ],
IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,],
contrastWBC = NULL, nonnegative = TRUE,
lessThanOne = FALSE
)
I've also heard of this solution, but I have not tried it myself. Ideally you should use Noob.
Let me know if that works for you.
Thanks a lot @lucassalas for your extremely fast reply.
The code you provided solved the problem right away. Results now make much more sense!
The only detail was that IDOLOptimizedCpGsBlood object was not found, and instead, something was found by the name of IDOLOptimizedCpGs. The description was the same so I assumed it was that one instead.
Thank you again. Best Carlos
On a side note:
I see that the .comptable is not available for the cord blood dataset. Is there any way to derive it from the data.frame (IDOLOptimizedCpGsCordBlood) ?
And also: is there any plan to include the brain DNAm reference included in previous versions of estimatecellcounts?
Thanks a lot Best Carlos
Quick answers: If you are using FlowSorted.CordBloodCombined.450k
library(FlowSorted.CordBloodCombined.450k)
FlowSorted.CordBloodCombined.450k.compTable
If you want to use the Guintivano et al., you can use the method on library(FlowSorted.DLPFC.450k).
However, we published HiBED recently. You should explore that alternative instead that is much more comprehensive. The paper is here.
Good luck.
We will add this solution to the package soon. In the meantime please use the following
devtools::install_github("jokergoo/IlluminaHumanMethylationEPICv2manifest")
devtools::install_github("jokergoo/IlluminaHumanMethylationEPICv2anno.20a1.hg38")
library(minfi)
library(sesame)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
library(IlluminaHumanMethylationEPICv2anno.20a1.hg38)
library(IlluminaHumanMethylationEPICv2manifest)
##############################
# Load RGset
RGset = read.metharray.exp(workdir,recursive = TRUE)
annotation(RGset)["array"] = "IlluminaHumanMethylationEPICv2"
annotation(RGset)["annotation"] = "20a1.hg38"
MSet <-preprocessNoob(RGset)
Betas<-getBeta(MSet)
Betas<- sesame::betasCollapseToPfx(Betas) #you can also use ENmix::rm.cgsuffix(Betas) or other function to remove replicates
library(FlowSorted.Blood.EPIC)
IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))]
identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2)
propEPIC <- projectCellType_CP(
Betas[IDOLOptimizedCpGsBloodv2, ],
IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,],
contrastWBC = NULL, nonnegative = TRUE,
lessThanOne = FALSE
)
Hello,
I am also trying to use estimateCellCounts2 with methylation data from an EPICV2 array. I've tried the code you provided but I still have issues : when I am trying to use it the exact error message is "Error in getBeta(MSet)[IDOLOptimizedCpGsBloodv2, ] : subscript out of bounds" . I am not sure to understand what is the exact problem with it now.
Thanks in advance if you have any advice for this.
Hi @hguigui123 m
If you see the code above, you should not use the getBeta(MSet). The EPICV2 has a different set of names due to the presence of technical replicates (cgXXXXXX_TCXX or cgXXXXXX_BCXX) you need to collapse the matrix to a single value using sesame (sesame::betasCollapseToPfx) or ENMix (ENmix::rm.cgsuffix) functions and then you can use those values to project the cell types.
I hope that answers your question.
Good luck.
Please check that this is TRUE in the code above:
IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))] identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2)
You should have two matrices with the same dimensions. This is at the beta value level as you need the product to be a positive definite to progress in the calculation. The "Betas" object is after collapsing the cg names.
Hello,
Indeed it works perfectly now.
Thanks again for your very precious help !
Hi, I have seen in the documentation how I can get download the reference data set using the function libraryDataGet(title) to do the analysis offline. This works for "FlowSorted.Blood.EPIC", but what is the name of the dataset for 450K? I have tried "FlowSorted.Blood450k", but I get an error message: Error in .local(x, i, j = j, ...) : 'i' must be length 1
Thank you for any help
Hi @annekristin, Could you please elaborate what is the purpose of using the 450k data? If you are trying to use FlowSorted.Blood.EPIC for deconvolving a 450K legacy library you do not need that library (and we DO NOT recommend that for our libraries). If you want to use the library, for other purposes, the data derived from Reinius et al publication can be downloaded using an independent library package from Bioconductor (not hosted on ExperimentHub). I hope that helps.
Thank you for your reply, @lucassalas. I want to perform a meta-analysis between datasets from 450k, epicv1 and epicv2. As part of this I need to estimate the cell counts for each platform. I was hoping to use the IDOL probes for all platforms (trying to keep the pipelines for each platform as similar as possible). If I understand you correctly I should use the FlowSorted.Blood.EPIC dataset as the reference set for all 3 platforms.
According to the documentation for estimateCellCounts2 I can set the reference platform to 450k, but it will only accept epic. I see someone else have had a similar issue, and that you replied that you deal with this under the hood. Does that mean that even when I leave the reference platform to the default "IlluminaHumanMethylationEPIC", you extract the array type from the RGset and set the appropriate platform under the hood, and that if I select IDOL probes and you see that my RGset is from 450k, you will choose the 450k legacy probes? Just trying to understand how it works :-)
Is there a benefit to using the IDOL legacy probes with 450k, or am I just as well off using estimateCellCounts from minfi for this platform?
Thank you for your help
Hi @annekristin,
Your statement is correct. The reference platform refers to the reference that you will use for the process. In this case, you keep the EPIC. In our 2018 paper, we generated an EPIC and a 450k legacy library that the algorithm will select automatically (using the EPIC reference) depending on which platform corresponds to your samples (this is the under-the-hood statement). The benefit, as mentioned in the 2018 paper, is in terms of precision for several cell types. I would recommend that you use the IDOL libraries only. You do not need to modify these parameters and use the default. The EPICV2 is slightly more complicated as we are not allowed to use GitHub libraries in Bioconductor packages, and there are no Bioconductor options for the libraries that I mentioned on Jan 9 that I can incorporate into the official package. Please use that approach for now.
Thank you so much @lucassalas !
Dear Lucas,
I am analyzing some PBMCs data from the Epicv2. I used the approach you mentioned in Jan 09th and it worked perfectly! However, it also includes Neutrophils, which of course we don't have in our PBMCs dataset... My colleague who had data from the Epicv1 used estimateCellCounts2 with the argument cellTypes = c("CD8T","CD4T", "NK","Bcell","Mono")... Is there a way to do something similar with the projectCellType_CP?
Thank you in advance
Hi @pcibin,
Yes, it is possible you can select the columns on the reference
library(FlowSorted.Blood.EPIC) IDOLOptimizedCpGsBloodv2<- IDOLOptimizedCpGs[which(IDOLOptimizedCpGs%in%rownames(Betas))] identical(rownames(IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,]), IDOLOptimizedCpGsBloodv2) propEPIC <- projectCellType_CP( Betas[IDOLOptimizedCpGsBloodv2, ], IDOLOptimizedCpGs.compTable[IDOLOptimizedCpGsBloodv2,c("CD8T","CD4T", "NK","Bcell","Mono")], contrastWBC = NULL, nonnegative = TRUE, lessThanOne = FALSE )
However, from the biological standpoint, I would leave the Neutrophils as is; even with optimal Ficoll, sometimes PMNs can cross or degranulate. Please double check that the columns are correct in your reference matrix.
I am trying to use estimateCellCounts2 on an RGset of 450k probes, however, I am getting the error: Error in match.arg(referencePlatform): 'arg' should be “IlluminaHumanMethylationEPIC”. However, my RGset is 450k probes. Do you know why this is occurring?