Closed bjpop closed 2 years ago
Thanks for the input.
First, for the memory requirement, the CreateSequenceDictionary
task doesn't limit the memory given to the JVM, so I've added parameters to set a maximum. I can't imagine this task needs more than the default 1G, but without a limit the garbage collector might not kick in and instead the process will claim more and more memory until the cluster system kills it. The change makes sure the JVM won't use more than the allocated memory (it actually assigns 128MB less to the JVM that given to the task, to allow overhead).
I've added some text to Running.md describing how to go about defining memory requirements on a per project basis in the project's nextflow config
. So if CreateSequenceDictionary
still blows up, this describes how to increase the size given.
Second, your change to samtools_faidx
is specific to using Ensembl chromosome naming (ours was specific to UCSC). To support UCSC or Ensembl naming without having to change the work flow I've added a parameter CHROMOSOME_ID_PREFIX
to nextflow.config
which is just a string, but really should be either "chr"
for UCSC references or the empty string (''
) for Ensembl ones. The default is "chr" but again it can be added to a project's nextflow.config
to set it differently.
Rich.
Add memory requirement to avoid failing in SLURM environment when memory use exceeds some default value.
Without this setting the pipeline fails on our SLURM cluster with an error:
/opt/conda/envs/invar2/bin/picard: line 66: 134483 Killed /opt/conda/envs/invar2/bin/java -Xms512m -Xmx2g -jar /opt/conda/envs/invar2/share/picard-2.26.10-0/picard.jar CreateSequenceDictionary "--REFERENCE" "human_g1k_v37_decoy.fasta" "--OUTPUT" "human_g1k_v37_decoy.dict"
As you can see the process was killed because it exceeded its memory request.