maxplanck-ie / snakepipes

Customizable workflows based on snakemake and python for the analysis of NGS data
http://snakepipes.readthedocs.io
381 stars 85 forks source link

memory for plotCorr_bed_spearman #943

Closed roddypr closed 7 months ago

roddypr commented 11 months ago

Hi

I am using snakePipes for the first time and I am a bit lost. The mRNAseq pipeline kept on failing at the plotCorr_bed_spearman rule. I've pasted some of the error messages below, but to summarise, this step was running out of memory. I managed to fix it by changing the default memory in cluster.yaml from:

__default__:
    memory: 1G

to

__default__:
    memory: 6G

Is this the correct thing to do?

I could not find a per-rule memory setting for plotCorr_bed_spearman anywhere. The following lines in cluster.yaml seem not to affect the memory of this specific step:

plotCorrelation_pearson:
    memory: 3G
plotCorrelation_pearson_allelic:
    memory: 5G
plotCorrelation_spearman:
    memory: 3G
plotCorrelation_spearman_allelic:
    memory: 2G

Sorry if this is not really an issue but just me misunderstanding how to configure snakePipes. What should I be doing to configure this step correctly?

Thank you so much for your help.

Best wishes,

Roddy

Error in log file:

Error in rule plotCorr_bed_pearson:
    jobid: 1409
    input: deepTools_qc/multiBigwigSummary/coverage.bed.npz
    output: deepTools_qc/plotCorrelation/correlation.pearson.bed_coverage.tsv
    log: deepTools_qc/logs/plotCorrelation_pearson.out, deepTools_qc/logs/plotCorrelation_pearson.err (check log file(s) for error message)
    conda-env: /PATH/TO/snakePipes/envs/91405dda58cec033030feb06c0589699
    shell:

    plotCorrelation                 -in deepTools_qc/multiBigwigSummary/coverage.bed.npz                 --plotFile deepTools_qc/plotCorrelation/correlation.pearson.bed_coverage.heatmap.png                 --corMethod pearson                 --whatToPlot heatmap                 --skipZeros                 --plotTitle 'Pearson correlation of genes coverage'                 --outFileCorMatrix deepTools_qc/plotCorrelation/correlation.pearson.bed_coverage.tsv                 --colorMap PuBuGn                 --plotNumbers > deepTools_qc/logs/plotCorrelation_pearson.out 2> deepTools_qc/logs/plotCorrelation_pearson.err

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
    cluster_jobid: Submitted batch job 4384459

Error executing rule plotCorr_bed_pearson on cluster (jobid: 1409, external: Submitted batch job 4384459, jobscript: /PATH/TO/snakepipes_try/results/.snakemake/tmp.nens7ebc/snakejob.plotCorr_bed_pearson.1409.sh). For error details see the cluster log and the log files of the involved rule(s).

The rule was running out of memory (probably a very fast spike in memory use (?), as the Memory Efficiency is quite low):

Job ID: 4384459
Cluster: slurm
User/Group: [...]
State: OUT_OF_MEMORY (exit code 0)
Cores: 1
CPU Utilized: 00:00:08
CPU Efficiency: 9.41% of 00:01:25 core-walltime
Job Wall-clock time: 00:01:25
Memory Utilized: 132.05 MB
Memory Efficiency: 12.90% of 1.00 GB
katsikora commented 11 months ago

Hi Roddy,

it looks like you've managed to fix the issue on your own.

Indeed, there is one global memory resource setting defined as __default__ , and then rule-specific memory declarations . The rule-specific declarations 'override' the default setting for the listed rules. All other rules are consuming the default memory amount.

If a rule fails due to hitting a memory limit, you can add a declaration for that rule to your cluster_config.yaml and assign a value. I'd recommend to copy the cluster_config.yaml from your output directory and modify the copy, the submit it to snakePipes workflow with --clusterConfigFile . Increasing the default memory as you did will also work, but you will end up reserving more resources for all the rules that are using the 'default' memory value on your cluster environment.

To be honest, this job state report is a bit confusing: there is an out_of_memory state listed, but exit code is 0? Also the reported memory usage looks like well under the 1Gb assigned to it. But if you're saying that assigning 6Gb as default fixed it, then it might indeed have been the issue.

Hope this helps, Best wishes,

Katarzyna

roddypr commented 11 months ago

Dear Katarzyna,

Thank you for your very clear explanation. It helps a lot because I would like to configure snakePipes to be used across different people in my department.

Just to make sure I understand, in my case I should add the following to a local cluster config file (running with --clusterConfigFile):

plotCorr_bed_spearman:
    memory: 6G

The job state report is indeed a bit confusing. My colleagues mentioned that sometimes there are peaks of memory use that are not accurately measured by seff, which could explain the problem (in fact I will try running the pipeline with 2G for this step, which is probably enough).

Best wishes,

Roddy

katsikora commented 11 months ago

Dear Roddy,

I'd have thought that you might be facing a one-off issue with a particularly large dataset e.g. a very high number of samples. In that case, rerunning the workflow with a custom cluster config file would be good enough.

If you think this is rather going to be a recurrent issue, you might consider configuring your snakePipes installation with snakePipes config --clusterConfig and passing a custom "shared" cluster config file. You can see the default "shared" cluster config here. For the rule plotCorr_bed_spearman, it should be alright to put it in this "shared" cluster config. A couple of other deepToools-based rules are also defined there.

The cluster config that you see in your workflow output folder is a merge between the "shared" and the workflow-specific cluster configs. The merge is done at runtime.

Let me know if you have any other questions,

Best wishes,

Katarzyna