iqbal-lab-org / minos

Variant call adjudication
MIT License
16 stars 5 forks source link

nextflow script for multisample does not respect cached precluster_small_vars_merge #75

Closed iqbal-lab closed 4 years ago

iqbal-lab commented 5 years ago

Having successfully got all the way to adjudication at the end, i reran with -resume, and saw this

7a/87355b] Cached process > process_input_vcf_file (52667) [07/7479e1] Cached process > process_input_vcf_file (52670) [e7/e96cb4] Cached process > process_input_vcf_file (52671) [fa/0df118] Cached process > process_input_vcf_file (52669) [e0/8a8c50] Cached process > process_input_vcf_file (52668) [8a/011b05] Cached process > process_input_vcf_file (52673) [5c/925d24] Cached process > process_input_vcf_file (52672) [7c/6d669a] Cached process > process_input_vcf_file (52674) [f5/ee5638] Submitted process > pre_cluster_small_vars_merge

iqbal-lab commented 5 years ago

Does not make a lot of sense to me; if we look here https://github.com/iqbal-lab-org/minos/blob/43114e71bb222c093943e65e9615f7e99fc93bf5/minos/multi_sample_pipeline.py#L299

it's clear it produces an output file, which it ought to notice and not re-run

martinghunt commented 5 years ago

This is probably a "feature/bug" of Nextflow, as opposed to a bug our nextflow script. Have often seen Nextflow not recognise processes that it's run already.

iqbal-lab commented 5 years ago

curse it @martinghunt . i've got all the way to adjudicate now for the 50,000 - i dont want to rerun any of that!

iqbal-lab commented 5 years ago

out of interest @martinghunt - would you expect this bit of code

https://github.com/iqbal-lab-org/minos/blob/43114e71bb222c093943e65e9615f7e99fc93bf5/minos/multi_sample_pipeline.py#L308

to output a pre_cluster_small_vars_merge.vcf file into the working dir? it's not in there, which is why nextflow is rerunning.

martinghunt commented 5 years ago

I'd expect it to output that vcf file into the task's hashed dir in the nextflow work dir. Doesn't make sense for there not to be one if the pipeline got past that stage.

iqbal-lab commented 5 years ago

omg this is so annoying. it is redoing all of the gramtools builds. for gods sake. the entire build directories were there, whats its problem

martinghunt commented 5 years ago

How about this? Get a copy of pre_cluster_small_vars_merge.vcf, and hack the nextflow pipeline so that process does this:

cp /wherever/you/put/your/copy/of/pre_cluster_small_vars_merge.vcf .

instead of making it from scratch again.

iqbal-lab commented 5 years ago

I'll reproduce on small data on Monday. That workaround makes sense, but for gramtools build directories is a bit mad. I wonder if is looking in wrong place for the output