uclahs-cds / pipeline-call-sSNV

A Nextflow pipeline to identify the somatic single nucleotide variants (sSNVs) by comparing a pair of tumor/normal samples.
https://uclahs-cds.github.io/pipeline-call-sSNV/
GNU General Public License v2.0
5 stars 0 forks source link

Add missing processes in SomaticSniper to resource configs #305

Open tyamaguchi-ucla opened 4 months ago

tyamaguchi-ucla commented 4 months ago
          It looks like our configs miss a few processes in SomaticSniper (e.g. `generate_ReadCount_bam_readcount`) @sorelfitzgibbon

Missing processes may have issues with resource allocations when the processing node is busy.

Originally posted by @tyamaguchi-ucla in https://github.com/uclahs-cds/pipeline-call-sSNV/issues/288#issuecomment-2167133386

sorelfitzgibbon commented 3 days ago

@tyamaguchi-ucla I started a Discussion on this topic, which we covered in today's Nextflow working group meeting. It looks like the process you mentioned generate_ReadCount_bam_readcount was overlooked and clearly should have been included in the resources allocations. It uses around 500 MB typically and runs for close to 30 minutes. I will fix this.

There are many other processes within SomaticSniper and Intersect that use < 20 MB and run within milliseconds. Ideally we'd like to leave these out of the resource configurations, but wanted to check if you had any specific issues in mind with regard to this. Currently a default of 1 cpu is applied (within base.config) but no default memory is applied. A small default with retry could be added, but one response to this suggestion was essentially "if it aint broke don't fix it", due to the potential for new errors to occur.

tyamaguchi-ucla commented 3 days ago

@tyamaguchi-ucla I started a Discussion on this topic, which we covered in today's Nextflow working group meeting. It looks like the process you mentioned generate_ReadCount_bam_readcount was overlooked and clearly should have been included in the resources allocations. It uses around 500 MB typically and runs for close to 30 minutes. I will fix this.

There are many other processes within SomaticSniper and Intersect that use < 20 MB and run within milliseconds. Ideally we'd like to leave these out of the resource configurations, but wanted to check if you had any specific issues in mind with regard to this. Currently a default of 1 cpu is applied (within base.config) but no default memory is applied. A small default with retry could be added, but one response to this suggestion was essentially "if it aint broke don't fix it", due to the potential for new errors to occur.

@sorelfitzgibbon Can you check the M64.config? I had to add generate_ReadCount_bam_readcount (maybe other processes) to process some high coverage samples (~140X ish).

https://github.com/uclahs-cds/pipeline-call-sSNV/blob/main/config/M64.config

https://github.com/uclahs-cds/pipeline-call-sSNV/pull/300

https://github.com/uclahs-cds/pipeline-call-sSNV/pull/300/commits/32834ff0b360f96e01fd8214baccfe69253778b5