Closed trebbiano closed 4 years ago
Hi @trebbiano
I'm not sure how much we can help with this; tradeSeq
is relying on BiocParallel
to do the parallelization and it seems the error is there rather than in tradeSeq
.
It does seem that fitGAM
may error for some of your jobs given the 2 parallel jobs did not deliver results
message.
It might be worth trying to parallelize the fitting of only a small number of genes (say 50) with only a couple of knots (say 6) to see if that runs successfully. You may want to install the latest version from GitHub since we've implemented a new way to catch the fitGAM
errors, maybe that helps.
Let us know if that works for you.
Thanks for the reply. I have updated to the latest tradeSeq
, version 1.3.0. This time I see a different error:
> sce5k <- fitGAM(counts = countsOrdered, sds = crv1, nknots=15, verbose=TRUE, genes=1:5000, parallel=TRUE, BPPARAM=BPPARAM)
Adding 2265 jobs ...
Submitting 2265 jobs in 12 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error in names(res) <- nms :
'names' attribute [5000] must be the same length as the vector [2265]
2265 is the number of cells in this dataset and is equivalent to the 23989 above, otherwise the datasets are equivalent.
I also tried the sub-sampling at lower knots number as you suggested, which produces a yet different error:
> sce100knots6 <- fitGAM(counts = countsOrdered, sds = crv1, nknots=6, verbose=TRUE, genes=1:100, BPPARAM=BPPARAM, parallel=TRUE)
Adding 2265 jobs ...
Submitting 2265 jobs in 12 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: unused arguments (B2M = 120, TMSB4X = 48, MT.CO1 = 59, RPL41 = 48, EEF1A1 = 57, RPL13 = 55, RPS12 = 46, RPL10 = 41, MT.A
TP6 = 52, MT.CO2 = 49, RPLP1 = 63, TPT1 = 35, RPS27 = 51, RPS18 = 55, MT.CO3 = 31, RPL28 = 39, RPS19 = 17, RPS15A = 32, MT.CYB = 31, RP
S27A = 33, RPS2 = 24, RPL34 = 25, RPL30 = 46, RPS14 = 34, TMSB10 = 25, RPL32 = 31, RPS3 = 26, RPS23 = 26, RPS24 = 29, RPS8 = 28, RPL19
= 24, GNLY = 0, RPS3A = 22, RPL11 = 26, RPL3 = 20, ACTB = 22, RPL39 = 22, CCL5 = 8, RPL37 = 19, PTMA = 30, RPL26 = 24, RPL21 = 28, RPS2
9 = 19, MT.ND3 = 27, RPL13A = 38, RPS6 = 23, RPL18A = 26, RPS28 = 25, RPLP2 = 18, MT.ND4 = 28, RPL12 = 15, RPS15 = 21, RPL7A = 11, AC00
4086.1 = 26, RPL23A = 23, RPS7 = 16, RPS25 = 16, RPS4X = 15, NKG7 = 1, RPL35A = 17, RPL14 = 10, RPL18 = 22, RPL27A = 27, HLA.A = 13, FA
U = 11, RPL9 = 21, RPL15 = 17, HLA.C = 12, RPL29 = 10, RPS16 = 25, HLA.B = 12, RPS21 = 20, RPL37A = 16, RPL5 = 16, RPS9 = 1
I never encountered these issues with the parallel mode switched off.
Hi @trebbiano
Thank you for checking that. The error may be a bug in subsetting the genes, I'l llook into that and keep you updated.
Considering parallelization, could you please check if the following runs, just to check whether the error there isn't coming from the subsetting using the genes
argument, too:
sce5k <- fitGAM(counts = countsOrdered[1:200,], sds = crv1, nknots=6, verbose=TRUE, parallel=TRUE, BPPARAM=BPPARAM)
For 200 genes, this direct subsetting gives the same error message as using genes=1:200
:
> sce200Sub <- fitGAM(counts = countsOrdered[1:200,], sds = crv1, nknots=6, verbose=TRUE, parallel = TRUE, BPPARAM=BPPARAM)
Adding 2265 jobs ...
Submitting 2265 jobs in 12 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error: BiocParallel errors
element index: 1, 2, 3, 4, 5, 6, ...
first error: unused arguments (B2M = 120, TMSB4X = 48, MT.CO1 = 59, RPL41 = 48, EEF1A1 = 57, RPL13 = 55, RPS12 = 46, RPL10 = 41, MT.ATP6 = 52, MT.CO2 = 49, RPLP1 = 63, TPT1 = 35, RPS27 = 51, RPS18 = 55, MT.CO3 = 31, RPL28 = 39, RPS19 = 17, RPS15A = 32, MT.CYB = 31, RPS27A = 33, RPS2 = 24, RPL34 = 25, RPL30 = 46, RPS14 = 34, TMSB10 = 25, RPL32 = 31, RPS3 = 26, RPS23 = 26, RPS24 = 29, RPS8 = 28, RPL19 = 24, GNLY = 0, RPS3A = 22, RPL11 = 26, RPL3 = 20, ACTB = 22, RPL39 = 22, CCL5 = 8, RPL37 = 19, PTMA = 30, RPL26 = 24, RPL21 = 28, RPS29 = 19, MT.ND3 = 27, RPL13A = 38, RPS6 = 23, RPL18A = 26, RPS28 = 25, RPLP2 = 18, MT.ND4 = 28, RPL12 = 15, RPS15 = 21, RPL7A = 11, AC004086.1 = 26, RPL23A = 23, RPS7 = 16, RPS25 = 16, RPS4X = 15, NKG7 = 1, RPL35A = 17, RPL14 = 10, RPL18 = 22, RPL27A = 27, HLA.A = 13, FAU = 11, RPL9 = 21, RPL15 = 17, HLA.C = 12, RPL29 = 10, RPS16 = 25, HLA.B = 12, RPS21 = 20, RPL37A = 16, RPL5 = 16, RPS9 = 1
For completeness I also tried the direct subsetting for the larger gene set and higher knots which produced the same error as genes=1:5000
:
> sce5kSub <- fitGAM(counts = countsOrdered[1:5000,], sds = crv1, nknots=15, verbose=TRUE, parallel = TRUE, BPPARAM=BPPARAM) Adding 2265 jobs ...
Submitting 2265 jobs in 12 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error in names(res) <- nms :
'names' attribute [5000] must be the same length as the vector [2265]
Thanks for looking into this!
Hi @trebbiano
I'm not getting any errors when parallelizing the fitting using your settings (but reduced to 2 workers), also not when subsetting genes. I tried this by borrowing code from #37 which I paste below. Could you check if this errors for you? It may be worth updating tradeSeq
to the latest version.
library(splatter)
library(tradeSeq)
library(SingleCellExperiment)
library(slingshot)
nGenes <- 60179
numCells <- 100
current_seed <- 328585
params <- newSplatParams()
#Simulate no genes to be DE such that we would expect a small number of rejections
SplatterSimObject <- splatSimulate(params,
method="paths",
nGenes=nGenes,
batchCells=numCells,
seed=current_seed,
lib.loc=11.49293, de.prob = 0, de.facLoc = log(2)*3,
out.prob = 0)
countsT <- counts(SplatterSimObject)
filt_func <- function(x){
ncells_high_exp <- sum(x >= 10)
return(ncells_high_exp)
}
rows_to_keep <- apply(countsT, 1, filt_func)
counts <- countsT[rows_to_keep > 10,]
FQnorm <- function(counts){
rk <- apply(counts,2,rank,ties.method='min')
counts.sort <- apply(counts,2,sort)
refdist <- apply(counts.sort,1,median)
norm <- apply(rk,2,function(r){ refdist[r] })
rownames(norm) <- rownames(counts)
return(norm)
}
norm_counts <- FQnorm(counts)
SCEObj <- SingleCellExperiment(assays = List(counts = counts, norm_counts = norm_counts))
pca <- prcomp(t(log1p(assays(SCEObj)$norm_counts)), scale. = FALSE)
rd <- pca$x[,1:2]
reducedDims(SCEObj) <- SimpleList(PCA = rd)
cl <- kmeans(rd, centers = 4)$cluster
colData(SCEObj)$kMeans <- cl
slingshot_results <- slingshot(SCEObj, clusterLabels = 'kMeans', reducedDim = 'PCA')
lin <- getLineages(SCEObj, clusterLabels = colData(slingshot_results)$kMeans, reducedDim = 'PCA')
crv <- SlingshotDataSet(getCurves(lin))
register(BatchtoolsParam(workers=2))
BPPARAM <- BiocParallel::bpparam()
BPPARAM
sce <- fitGAM(counts = counts, sds = crv, nknots = 6, verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
sce <- fitGAM(counts = counts, sds = crv, nknots = 6, genes=1:200, verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
Thanks for working on this issue. I was able to run this code using up to date release versions (+tradeSeq
to 1.3.0) of the packages but I encountered a similar error as before. If you are not seeing the error it could be something specific to the BPPARAM
variable in my environment. Here is the error:
> sce <- fitGAM(counts = counts, sds = crv, nknots = 6, verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
Adding 100 jobs ...
Submitting 100 jobs in 2 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error in names(res) <- nms :
'names' attribute [4921] must be the same length as the vector [100]
> sce <- fitGAM(counts = counts, sds = crv, nknots = 6, genes=1:200, verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
Adding 100 jobs ...
Submitting 100 jobs in 2 chunks using cluster functions 'Multicore' ...
Clearing registry ...
Error in names(res) <- nms :
'names' attribute [200] must be the same length as the vector [100]
Traceback:
> traceback()
8: BiocParallel::bplapply(as.data.frame(t(as.matrix(counts)[id,
])), counts_to_Gam, BPPARAM = BPPARAM)
7: BiocParallel::bplapply(as.data.frame(t(as.matrix(counts)[id,
])), counts_to_Gam, BPPARAM = BPPARAM)
6: .fitGAM(counts = counts, U = U, pseudotime = pseudotime, cellWeights = cellWeights,
genes = genes, weights = weights, offset = offset, nknots = nknots,
verbose = verbose, parallel = parallel, BPPARAM = BPPARAM,
control = control, sce = sce, family = family)
5: .local(counts, ...)
4: fitGAM(counts = counts, sds = crv, nknots = 6, genes = 1:200,
verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
3: fitGAM(counts = counts, sds = crv, nknots = 6, genes = 1:200,
verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE)
2: .traceback(x)
1: traceback(sce <- fitGAM(counts = counts, sds = crv, nknots = 6,
genes = 1:200, verbose = TRUE, BPPARAM = BPPARAM, parallel = TRUE))
Here is my BPPARAM
:
> BPPARAM
class: BatchtoolsParam
bpisup: FALSE; bpnworkers: 2; bptasks: 40; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: NA; bptimeout: 2592000; bpprogressbar: TRUE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: multicore
template: NA
registryargs:
file.dir: /gpfs/home/trebbiano/file772e5b48e335
work.dir: getwd()
packages: character(0)
namespaces: character(0)
source: character(0)
load: character(0)
make.default: FALSE
saveregistry: FALSE
resources:
Does this differ from yours? I would guess a hardware configuration difference but then why would that produce a different size vector? Could BiocParallel
implement different methods for different hardware configurations even if BPPARAM
are the same?
Hi @trebbiano, since you are experiencing the error using the same code but I cannot reproduce it I assume it must be something about either the system or the BPPARAM
settings.
However, I am currently unsure which BPPARAM
parameters might cause the issues as I do not know enough about how each parameter influences the parallelization to give a useful response to that. However, I did recently run fitGAM
using parallelization on a Linux compute cluster...
Here is my output for BPPARAM
:
> BPPARAM
class: MulticoreParam
bpisup: FALSE; bpnworkers: 6; bptasks: 0; bpjobname: BPJOB
bplog: FALSE; bpthreshold: INFO; bpstopOnError: TRUE
bpRNGseed: ; bptimeout: 2592000; bpprogressbar: FALSE
bpexportglobals: TRUE
bplogdir: NA
bpresultdir: NA
cluster type: FORK
That looks quite different! Let me look into what the parameter settings mean and if I can adjust them to look similar to yours. If they are not enforced by the hardware configuration it may be possible.
Hi @trebbiano
Yesterday I've experienced the same error you've experienced, however, I knew that I ran that code before on that system (it was a Linux cluster). So I just tried again and then it worked... So it seems like something goes wrong with the parallelization in BiocParallel
. It's possibly better to try a bit fewer cores to lower the probability of errors.
Hi, I'm running v1.1.16 in R3.6.1 in a CentOS environment (more info below). I get the following error running
fitGAM
in parallel mode:The same command runs fine without
parallel = TRUE
but the cpu time is prohibitively long. The error seems to come frombatchtools
but could be related to howtradeSeq
prepares the jobs. Thanks in advance for looking into this!Relevant upstream code: