pipeline fails: std::bad_alloc ERROR

asifzubair commented 5 years ago

I followed the instructions but the pipeline is failing with this message:

label: hierarchy (with options) 
List of 4
 $ echo      : logi FALSE
 $ fig.height: num 8
 $ fig.width : num 8
 $ fig.cap   : chr "Hierarchy of immune cell types used for mapping cell types between methods and datasets."

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
/bin/bash: line 2: 76154 Aborted                 (core dumped) Rscript -e "bookdown::render_book('index.Rmd')"
[Fri Jul 26 16:52:30 2019]
Error in rule book:
    jobid: 0
    output: results/book/index.html, results/cache/.dir, results/figures/schelker_single_cell_tsne.pdf, results/figures/spillover_migration_chart.jpg, results/figures/spillover_migration_all.pdf, results/tables/mixing_study_correlations.tsv, results/tables/spillover_signal_noise.tsv
    conda-env: /home/azubair/projects/immune_deconvolution_benchmark/.snakemake/conda/151e8d15
    shell:

    touch results/cache/.dir
    rm -f results/book/figures && ln -s ../figures results/book/figures
    cd notebooks && Rscript -e "bookdown::render_book('index.Rmd')"

        (exited with non-zero exit code)

I checked that memory usage doesn't go up at all, so I'm not inclined to think that it is because I am out of memory.

Any ideas why this could be happening? Thanks.

grst commented 5 years ago

Could still be a memory issue... How much RAM do you have?

Also, what operating system?

On Fri, Jul 26, 2019, 23:57 Asif Zubair notifications@github.com wrote:

I followed the instructions but the pipeline is failing with this message:

label: hierarchy (with options) List of 4 $ echo : logi FALSE $ fig.height: num 8 $ fig.width : num 8 $ fig.cap : chr "Hierarchy of immune cell types used for mapping cell types between methods and datasets."

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc /bin/bash: line 2: 76154 Aborted (core dumped) Rscript -e "bookdown::render_book('index.Rmd')" [Fri Jul 26 16:52:30 2019] Error in rule book: jobid: 0 output: results/book/index.html, results/cache/.dir, results/figures/schelker_single_cell_tsne.pdf, results/figures/spillover_migration_chart.jpg, results/figures/spillover_migration_all.pdf, results/tables/mixing_study_correlations.tsv, results/tables/spillover_signal_noise.tsv conda-env: /home/azubair/projects/immune_deconvolution_benchmark/.snakemake/conda/151e8d15 shell:
touch results/cache/.dir
rm -f results/book/figures && ln -s ../figures results/book/figures
cd notebooks && Rscript -e "bookdown::render_book('index.Rmd')"

    (exited with non-zero exit code)
I checked that memory usage doesn't go up at all, so I'm not inclined to think that it is because I am out of memory.

Any ideas why this could be happening? Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=ABVZRVZPEMMAMLIUUJHV2DTQBNXMBA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HBZZZXA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVZRV43JBFTNFCA24HPHKDQBNXMBANCNFSM4IHHOOFA .

asifzubair commented 5 years ago

I'm using a Ubuntu 16.04 with 62 GB RAM. I didn't change the config.R file so should be using 2 cores.

Thanks!

mlist commented 5 years ago

Hi Asif,

Gregor is travelling right now, so I will try to answer. It looks in your log like the script tried to allocate (reserve) ca. 75 GB of RAM which failed on your computer. My guess ist that you did not see memory usage go up because it could not be allocated in the first place. Once Gregor will be back he can give you a more qualified comment or maybe some hints to reduce memory usage.

Best, Markus

On Mon, Jul 29, 2019 at 7:16 PM Asif Zubair notifications@github.com wrote:

I'm using a Ubuntu 16.04 with 62 GB RAM. I didn't change the config.R file so should be using 2 cores.

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=AAKWBXPOMCYLV4I7STX3GOLQB4QVTA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3BMKIY#issuecomment-516080931, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKWBXMMJ7XVXJL5NFVTIQLQB4QVTANCNFSM4IHHOOFA .

-- Dr. Markus List Head of the Research Group on Big Data in BioMedicine Chair of Experimental Bioinformatics TUM School of Life Sciences Weihenstephan Technical University of Munich (TUM) Freising-Weihenstephan, Germany

e-mail: markus.list@wzw.tum.de tel: +49 8161 71 2761 www: http://biomedical-big-data.de orcid: https://orcid.org/0000-0002-0941-4168 twitter: @itisalist https://twitter.com/itisalist

asifzubair commented 5 years ago

Thank you, Markus.

Yes, it is strange that it tried to allocate 75 GB of RAM. I remember Gregor mentioning somewhere that we need 12 gigs per core and I thought I would be fine.

Of course, it would be great if Gregor could comment on this.

Thanks!

grst commented 5 years ago

Hi Asif,

can you double check that the cores are really set to 2 in your config.R? In an earlier version (few commits earlier) I had the default value higher.

I successfully ran the pipeline on Google colab with 12GB of ram, so in principle you should have more than enough.

Best, Gregor

On Tue, Jul 30, 2019, 22:00 Asif Zubair notifications@github.com wrote:

Thank you, Markus.

Yes, it is strange that it tried to allocate 75 GB of RAM. I remember Gregor mentioning somewhere that we need 12 gigs per core and I thought I would be fine.

Of course, it would be great if Gregor could comment on this.

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grst/immune_deconvolution_benchmark/issues/28?email_source=notifications&email_token=ABVZRV3X74E2KBJULE5NQ73QCBQQRA5CNFSM4IHHOOFKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3EOSOY#issuecomment-516483387, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVZRVYU5NCHVSNATZPOSQ3QCBQQRANCNFSM4IHHOOFA .

asifzubair commented 5 years ago

Sure, will do. Thank you for your reply.

asifzubair commented 5 years ago

just wondering, @grst any thoughts on releasing a docker image of this pipeline?

grst commented 5 years ago

Personally, I'm not a big fan of docker... freezing the conda packages is (should be) another way of ensuring reproducibility.

In practice, it seems to me that neither of the two systems does a perfect job. I personally had trouble getting to run various docker containers and in particular memory issues could still occur.

asifzubair commented 5 years ago

Hi @grst - I tried running this pipeline on the cluster and had a bit more luck there.

However, the pipeline still fails and the error message is the same as the one here - https://github.com/icbi-lab/immune_deconvolution_benchmark/issues/24#issuecomment-502604687

>>> Running timer
## Enter batch mode

## Loading immune gene expression

## Removing the batch effect of /lsf_tmp/88991367.tmpdir/RtmpNmCJKH/filec3ec84ae9e0

Found2batches
Adjusting for0covariate(s) or covariate level(s)
Fitting L/S model and finding priors
Finding parametric adjustments
Adjusting the Data

Quitting from lines 570-591 (_main.Rmd) 
Error in { : task 1 failed - "Recv failure: Connection reset by peer"
Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> %do% -> <Anonymous>
In addition: Warning messages:
1: Transformation introduced infinite values in continuous x-axis 
2: Removed 3407 rows containing non-finite values (stat_bin). 
3: In forceAndCall(1, FUN, newX[, i], ...) :
  closing unused connection 5 (http://raw.githubusercontent.com/ebecht/MCPcounter/master/Signatures/genes.txt)
4: In EPIC::EPIC(bulk = gene_expression_matrix, reference = ref, mRNA_cell = mRNA_cell,  :
  mRNA_cell value unknown for some cell types: CAFs, Endothelial - using the default value of 0.4 for these but this might bias the true cell proportions from all cell types.

Execution halted
[Wed Oct 30 12:42:32 2019]

One thing I did notice is this message at the start:

(base) [azubair@nodecn062 immune_deconvolution_benchmark]$ snakemake --use-conda
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   book
    1

Don't understand why snakemake is using 1 core when I've specified 8 in the config ?! Do you think I need to pass number of cores to snakemake as well ? This is my specification in the config:

(base) [azubair@nodecn062 immune_deconvolution_benchmark]$ head notebooks/config.R 
config = new.env()

registerDoMC(8)

grst commented 5 years ago

Yes, I you will have to specify --cores. It defines the max number of cores snakemake has available. (See https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#threads).

Recv failure: Connection reset by peer sounds like something went wrong when downloading a file from the internet (EPIC fetches the signatures matrices directly from github). Do the cluster nodes have internet access? Otherwise it could just be bad luck and work if you re-run it.

asifzubair commented 5 years ago

Hi @grst - I tried this again. I'm pretty sure that compute node has internet access as I clone the repository on the node itself. However, I am still getting the above error. Do you think the URL might have changed?

grst commented 5 years ago

Hm, this is weird. The CI still runs through, so MCP-counter seems to be able to fetch the files on the test server.

Maybe you try to get to run MCP-counter outside the pipeline to figure out what the actual problem is.

asifzubair commented 5 years ago

Sure, I'll try this again.

Also, @grst - I noticed the cell_type_map from the particular commit in immune_deconvolution_benchmark is different from the one included in the immunedeconv package (on the master branch). is this intentional? The package cell_type_map breaks some code in benchmark pipeline.

grst commented 5 years ago

Yes, that's intentional (related to the discussion in https://github.com/icbi-lab/immunedeconv/issues/14). immunedeconv >= 2.0 is incompatible with the pipeline.

asifzubair commented 5 years ago

Hi @grst - Looking closely it seems the problem is with TIMER and not EPIC. Would you know why this might be happening? Thanks!

grst commented 5 years ago

The error seems to happen in lines 570-591 of the intermediate markdown file (_main.Rmd):

Quitting from lines 570-591 (_main.Rmd)

Can you find the corresponding chunk and post it here?

asifzubair commented 4 years ago

Hi @grst . Apologies for getting back into this after so long.

This is the corresponding chunk (along with line numbers)

 569 ```{r, cache=TRUE, message=FALSE, echo=FALSE, warning=FALSE, results='hide'}
  570 timer_indications = rep("OV", ncol(schelker_ovarian$expr_mat))
  571 all_results_bulk = foreach(method = config$deconvolution_methods,
  572                            .final = function(x) {setNames(x, config$deconvolution_methods)}) %do% {
  573   deconvolute(schelker_ovarian$expr_mat, method, indications=timer_indications) %>%
  574     mutate(method=method) %>%
  575     mutate(source="bulk")
  576 }
  577 
  578 all_results_simulated = foreach(method=config$deconvolution_methods,
  579                                 .final = function(x) {setNames(x, config$deconvolution_methods)}) %do% {
  580   deconvolute(bulk_mean, method, indications=timer_indications) %>%
  581     mutate(method=method) %>%
  582     mutate(source="mean")
  583 }
  584 
  585 all_results = bind_rows(all_results_bulk, all_results_simulated) %>%
  586   # select(cell_type, `7873M`, `7882M`, `7892M`, source, method) %>%
  587   gather(donor, fraction, -cell_type, -source, -method) %>%
  588   spread(source, fraction)
  589 
  590 res_methods_validity$all_results = all_results
  591 ```

EDIT: Sorry, I just realised that I didn't post the complete chunk. Will do that asap. EDIT2: Updated to include the whole offending chunk.

icbi-lab / immune_deconvolution_benchmark

pipeline fails: std::bad_alloc ERROR #28