mahesh-panchal / nf-cascade

A proof of concept daisy-chaining Nextflow workflows
7 stars 3 forks source link

Pipeline chaining - TaskPollingMonitor #1

Open DLBPointon opened 4 weeks ago

DLBPointon commented 4 weeks ago

Hi Mahesh,

I've been nesting pipelines recently, namely sanger-tol/blobtoolkit and sanger-tol/curationpretext and have been attempting to move what I have been using, over to nf-cascade (BTW thanks for making such a clean method, my method has been rather rough around the edges). However, I have come across a couple of issues so far.

This is all in the sanger-tol/ear pipeline: https://github.com/sanger-tol/ear

  1. WARN: Process 'SANGERTOL_EAR:EAR:CURATIONPRETEXT' cannot be executed by 'lsf' executor -- Using 'local' executor instead Found here: https://github.com/sanger-tol/ear/blob/56760f8f38f412727309ec22d195feb43ea7678e/workflows/ear.nf#L52 As you can see we are using the singularity,sanger profiles and still it is being forced into local. Because of this, I think, it is losing the jobs from that pipeline and returning the endless loop error of:

    Aug-16 12:04:48.257 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor lsf > No more task to compute -- Execution may be stalled
    Aug-16 12:09:43.504 [Task monitor] DEBUG n.processor.TaskPollingMonitor - !! executor local > tasks to be completed: 1 -- submitted tasks are shown below
    ~> TaskHandler[id: 1; name: SANGERTOL_EAR:EAR:CURATIONPRETEXT (sanger-tol/curationpretext); status: RUNNING; exit: -; error: -; workDir: /nfs/treeoflife-01/teams/tola/users/dp24/ear/work/69/a24a26e3a657f96f8f58661fceb26d]

    In this case, I can even kill the LSF job and the pipeline continues in its internal wait for the process it can't find. Interestingly, the times I have been watching this have been where it dies at a bash script that scans through a cram file to count containers and grab the RG line. Nothing complicated, but perhaps because it is such a small job it slips through the cracks and gets lost. Obviously, this isn't inherently a nf-cascade issue it maybe makes it more obvious though.

  2. Secondly, there is the sanger-tol/blobtoolkit pipeline which as much as I try cannot be ported over to your method. Found here: https://github.com/sanger-tol/ear/blob/56760f8f38f412727309ec22d195feb43ea7678e/workflows/ear.nf#L197 This isn't really an issue, but more interesting that it stops being a straight forward process.

    My method of nesting is used here: https://github.com/sanger-tol/ear/blob/main/modules/local/sanger_tol_btk.nf , which could be a somewhat reasonable alternative for other pipelines and could be cleaned up to be made more generic using some of the functions you used in nf-cascade.

mahesh-panchal commented 3 weeks ago

Hiya. So the idea behind nf-cascade is that the child workflows are run in the same environment as the parent workflows, just automatically rather than manually. So if the manual settings you use normally use work, then they should just work in the module too.

Since you're using more input files, perhaps the version on the generalise branch is more appropriate to you. One thing I can't quite tell by just reading is what is in these variables.

    reference = YAML_INPUT.out.reference_path.get()
    hic_dir = YAML_INPUT.out.cpretext_hic_dir_raw.get()
    longread_dir = YAML_INPUT.out.longread_dir.get()

Files are intended to be supplied using the other input channels, but what the important thing is is that they should be Path objects. So maybe these are what is breaking the underlying command. Perhaps in the Nextflow run process, add a println nxf_cmd.join(" ") to see what's being passed to the builder.

As for blobtoolkit, same thing. Use the version under the generalise branch to make a Map of paths to pass as input.