nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images
https://oncopathtk.org
Other
21 stars 2 forks source link

Migrate to Nextflow #13

Closed jimmymathews closed 2 years ago

jimmymathews commented 3 years ago

The JobGenerator business works for LSF but we are starting to encounter maintainability issues already -- it appears that when a large number of jobs are scheduled, especially if they need to access the same large files, the performance suffers considerably.

My latest run of the diffusion + colon has 482 jobs (one per field of view), and only about 40% completed in 4 hours. Using the previous method (using LSF-specific array jobs functionality), all completed, but this was relatively complex and I don't think we should return to it. My intuition is saying it is concurrency issues, i.e. the sort of thing that should be delegated to a proper workflow engine like Nextflow.

This feature is a big chunk of work. Potentially the steps are:

  1. Learn enough of the Nextflow DSL.
  2. Implement new JobGenerator that actually just writes Nextflow scripts, or else substantially copies a static template file written in Nextflow language.
  3. Switch over the JobGenerators in the workflow definitions.
  4. Rename the old JobGenerators to reflect that they are LSF-specific (and eventually deprecate these).

If the modularity of the current design is as robust as I hope it is, then the above steps will be enough and the rest of the codebase does not need to change.

jimmymathews commented 3 years ago

This task was largely blocked by the singularity build workflow and lack of a proper container repository.

By commit 92c300bebe99c160b040db3b638928b2d9f55cab the needed partial migration to Docker is completed. Now singularity images can be pulled from a public Docker Hub repository.

jimmymathews commented 2 years ago

Migration to the Nextflow workflow engine is implemented as of 5399b5b55a63b3d1db5e518f6452314a420ddced and merged into main.

All of the 4 steps described in the issue description have been taken, as well as the additional insinuated step of deprecating the JobGenerators. The classes are still there but they just hold a tiny about of metadata that will shortly be drained into the computational design objects. "Job generation" as a task has been reduced to creating nextflow-consumable manifests of jobs, which has been added as its own CLI script application.

In the end a single static nextflow script is sufficient, a template with a template-filling step is not needed.