marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

CANU & Nextflow & SLURM #2285

Closed hbadrane closed 5 months ago

hbadrane commented 5 months ago

I'm trying to integrate CANU in a pipeline using Nextflow and run with SLURM. There are some troubles, because CANU is kind of a "wild horse" and Nextflow/Slurm aren't used to such...

Is there a way to solve this and make CANU_cmd hang-on till the run completes?

skoren commented 5 months ago

It should work if you add gridOptions="--wait" to make the commands Canu submits hold for completion. This does put extra load on the slurm scheduler since there's essentially a busy-wait continuously checking job status and there will be a chain of canu shell scripts waiting on each other (e.g. job 1 waiting on 2 waiting on 3 waiting on 4, etc). This is why the Canu processes run the way they do, there's no busy-waiting like in snakemake/nextflow.

The other option is to use the onSuccess/onFailure options built into Canu, see #2225, #1984.

hbadrane commented 5 months ago

Thanks Sergey. Will the gridOptions="--wait" also apply to the initial canu command task "canu .... genomesize=xxm ...." ?

skoren commented 5 months ago

Yes, it's a general Canu option for pass-through parameters to the scheduler.

hbadrane commented 5 months ago

Ok I will try it, and also look into the onSuccess/onFailure options.

hbadrane commented 5 months ago

Thank you again, yes the (gridOptions="--wait") solved the problem to have CANU capable of integrating in a pipeline. The onSuccess/onFailure options, haven't tried them yet, but should also work. Very well done and thought of. CLOSING...