Open asifzubair opened 5 years ago
Could still be a memory issue... How much RAM do you have?
Also, what operating system?
On Fri, Jul 26, 2019, 23:57 Asif Zubair notifications@github.com wrote:
I followed the instructions but the pipeline is failing with this message:
label: hierarchy (with options) List of 4 $ echo : logi FALSE $ fig.height: num 8 $ fig.width : num 8 $ fig.cap : chr "Hierarchy of immune cell types used for mapping cell types between methods and datasets."
terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /bin/bash: line 2: 76154 Aborted (core dumped) Rscript -e "bookdown::render_book('index.Rmd')" [Fri Jul 26 16:52:30 2019] Error in rule book: jobid: 0 output: results/book/index.html, results/cache/.dir, results/figures/schelker_single_cell_tsne.pdf, results/figures/spillover_migration_chart.jpg, results/figures/spillover_migration_all.pdf, results/tables/mixing_study_correlations.tsv, results/tables/spillover_signal_noise.tsv conda-env: /home/azubair/projects/immune_deconvolution_benchmark/.snakemake/conda/151e8d15 shell:
touch results/cache/.dir rm -f results/book/figures && ln -s ../figures results/book/figures cd notebooks && Rscript -e "bookdown::render_book('index.Rmd')" (exited with non-zero exit code)
I checked that memory usage doesn't go up at all, so I'm not inclined to think that it is because I am out of memory.
Any ideas why this could be happening? Thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=ABVZRVZPEMMAMLIUUJHV2DTQBNXMBA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBZZZXA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVZRV43JBFTNFCA24HPHKDQBNXMBANCNFSM4IHHOOFA .
I'm using a Ubuntu 16.04 with 62 GB RAM. I didn't change the config.R
file so should be using 2 cores.
Thanks!
Hi Asif,
Gregor is travelling right now, so I will try to answer. It looks in your log like the script tried to allocate (reserve) ca. 75 GB of RAM which failed on your computer. My guess ist that you did not see memory usage go up because it could not be allocated in the first place. Once Gregor will be back he can give you a more qualified comment or maybe some hints to reduce memory usage.
Best, Markus
On Mon, Jul 29, 2019 at 7:16 PM Asif Zubair notifications@github.com wrote:
I'm using a Ubuntu 16.04 with 62 GB RAM. I didn't change the config.R file so should be using 2 cores.
Thanks!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=AAKWBXPOMCYLV4I7STX3GOLQB4QVTA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BMKIY#issuecomment-516080931, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKWBXMMJ7XVXJL5NFVTIQLQB4QVTANCNFSM4IHHOOFA .
-- Dr. Markus List Head of the Research Group on Big Data in BioMedicine Chair of Experimental Bioinformatics TUM School of Life Sciences Weihenstephan Technical University of Munich (TUM) Freising-Weihenstephan, Germany
e-mail: markus.list@wzw.tum.de tel: +49 8161 71 2761 www: http://biomedical-big-data.de orcid: https://orcid.org/0000-0002-0941-4168 twitter: @itisalist https://twitter.com/itisalist
Thank you, Markus.
Yes, it is strange that it tried to allocate 75 GB of RAM. I remember Gregor mentioning somewhere that we need 12 gigs per core and I thought I would be fine.
Of course, it would be great if Gregor could comment on this.
Thanks!
Hi Asif,
can you double check that the cores are really set to 2 in your config.R? In an earlier version (few commits earlier) I had the default value higher.
I successfully ran the pipeline on Google colab with 12GB of ram, so in principle you should have more than enough.
Best, Gregor
On Tue, Jul 30, 2019, 22:00 Asif Zubair notifications@github.com wrote:
Thank you, Markus.
Yes, it is strange that it tried to allocate 75 GB of RAM. I remember Gregor mentioning somewhere that we need 12 gigs per core and I thought I would be fine.
Of course, it would be great if Gregor could comment on this.
Thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=ABVZRV3X74E2KBJULE5NQ73QCBQQRA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3EOSOY#issuecomment-516483387, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVZRVYU5NCHVSNATZPOSQ3QCBQQRANCNFSM4IHHOOFA .
Sure, will do. Thank you for your reply.
just wondering, @grst any thoughts on releasing a docker image of this pipeline?
Personally, I'm not a big fan of docker... freezing the conda packages is (should be) another way of ensuring reproducibility.
In practice, it seems to me that neither of the two systems does a perfect job. I personally had trouble getting to run various docker containers and in particular memory issues could still occur.
Hi @grst - I tried running this pipeline on the cluster and had a bit more luck there.
However, the pipeline still fails and the error message is the same as the one here - https://github.com/icbi-lab/immune_deconvolution_benchmark/issues/24#issuecomment-502604687
>>> Running timer
## Enter batch mode
## Loading immune gene expression
## Removing the batch effect of /lsf_tmp/88991367.tmpdir/RtmpNmCJKH/filec3ec84ae9e0
Found2batches
Adjusting for0covariate(s) or covariate level(s)
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data
Quitting from lines 570-591 (_main.Rmd)
Error in { : task 1 failed - "Recv failure: Connection reset by peer"
Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> %do% -> <Anonymous>
In addition: Warning messages:
1: Transformation introduced infinite values in continuous x-axis
2: Removed 3407 rows containing non-finite values (stat_bin).
3: In forceAndCall(1, FUN, newX[, i], ...) :
closing unused connection 5 (http://raw.githubusercontent.com/ebecht/MCPcounter/master/Signatures/genes.txt)
4: In EPIC::EPIC(bulk = gene_expression_matrix, reference = ref, mRNA_cell = mRNA_cell, :
mRNA_cell value unknown for some cell types: CAFs, Endothelial - using the default value of 0.4 for these but this might bias the true cell proportions from all cell types.
Execution halted
[Wed Oct 30 12:42:32 2019]
One thing I did notice is this message at the start:
(base) [azubair@nodecn062 immune_deconvolution_benchmark]$ snakemake --use-conda
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 book
1
Don't understand why snakemake
is using 1 core when I've specified 8 in the config ?! Do you think I need to pass number of cores to snakemake
as well ? This is my specification in the config
:
(base) [azubair@nodecn062 immune_deconvolution_benchmark]$ head notebooks/config.R
config = new.env()
registerDoMC(8)
Yes, I you will have to specify --cores
. It defines the max number of cores snakemake has available. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#threads).
Recv failure: Connection reset by peer
sounds like something went wrong when downloading a file from the internet (EPIC fetches the signatures matrices directly from github). Do the cluster nodes have internet access? Otherwise it could just be bad luck and work if you re-run it.
Hi @grst - I tried this again. I'm pretty sure that compute node has internet access as I clone the repository on the node itself. However, I am still getting the above error. Do you think the URL might have changed?
Hm, this is weird. The CI still runs through, so MCP-counter seems to be able to fetch the files on the test server.
Maybe you try to get to run MCP-counter outside the pipeline to figure out what the actual problem is.
Sure, I'll try this again.
Also, @grst - I noticed the cell_type_map
from the particular commit in immune_deconvolution_benchmark
is different from the one included in the immunedeconv
package (on the master branch). is this intentional? The package cell_type_map
breaks some code in benchmark pipeline.
Yes, that's intentional (related to the discussion in https://github.com/icbi-lab/immunedeconv/issues/14). immunedeconv >= 2.0
is incompatible with the pipeline.
Hi @grst - Looking closely it seems the problem is with TIMER and not EPIC. Would you know why this might be happening? Thanks!
The error seems to happen in lines 570-591 of the intermediate markdown file (_main.Rmd
):
Quitting from lines 570-591 (_main.Rmd)
Can you find the corresponding chunk and post it here?
Hi @grst . Apologies for getting back into this after so long.
This is the corresponding chunk (along with line numbers)
569 ```{r, cache=TRUE, message=FALSE, echo=FALSE, warning=FALSE, results='hide'}
570 timer_indications = rep("OV", ncol(schelker_ovarian$expr_mat))
571 all_results_bulk = foreach(method = config$deconvolution_methods,
572 .final = function(x) {setNames(x, config$deconvolution_methods)}) %do% {
573 deconvolute(schelker_ovarian$expr_mat, method, indications=timer_indications) %>%
574 mutate(method=method) %>%
575 mutate(source="bulk")
576 }
577
578 all_results_simulated = foreach(method=config$deconvolution_methods,
579 .final = function(x) {setNames(x, config$deconvolution_methods)}) %do% {
580 deconvolute(bulk_mean, method, indications=timer_indications) %>%
581 mutate(method=method) %>%
582 mutate(source="mean")
583 }
584
585 all_results = bind_rows(all_results_bulk, all_results_simulated) %>%
586 # select(cell_type, `7873M`, `7882M`, `7892M`, source, method) %>%
587 gather(donor, fraction, -cell_type, -source, -method) %>%
588 spread(source, fraction)
589
590 res_methods_validity$all_results = all_results
591 ```
EDIT: Sorry, I just realised that I didn't post the complete chunk. Will do that asap. EDIT2: Updated to include the whole offending chunk.
I followed the instructions but the pipeline is failing with this message:
I checked that memory usage doesn't go up at all, so I'm not inclined to think that it is because I am out of memory.
Any ideas why this could be happening? Thanks.