Closed jimmymathews closed 2 years ago
This task was largely blocked by the singularity build workflow and lack of a proper container repository.
By commit 92c300bebe99c160b040db3b638928b2d9f55cab the needed partial migration to Docker is completed. Now singularity images can be pulled from a public Docker Hub repository.
Migration to the Nextflow workflow engine is implemented as of 5399b5b55a63b3d1db5e518f6452314a420ddced and merged into main.
All of the 4 steps described in the issue description have been taken, as well as the additional insinuated step of deprecating the JobGenerators. The classes are still there but they just hold a tiny about of metadata that will shortly be drained into the computational design objects. "Job generation" as a task has been reduced to creating nextflow-consumable manifests of jobs, which has been added as its own CLI script application.
In the end a single static nextflow script is sufficient, a template with a template-filling step is not needed.
The JobGenerator business works for LSF but we are starting to encounter maintainability issues already -- it appears that when a large number of jobs are scheduled, especially if they need to access the same large files, the performance suffers considerably.
My latest run of the diffusion + colon has 482 jobs (one per field of view), and only about 40% completed in 4 hours. Using the previous method (using LSF-specific array jobs functionality), all completed, but this was relatively complex and I don't think we should return to it. My intuition is saying it is concurrency issues, i.e. the sort of thing that should be delegated to a proper workflow engine like Nextflow.
This feature is a big chunk of work. Potentially the steps are:
If the modularity of the current design is as robust as I hope it is, then the above steps will be enough and the rest of the codebase does not need to change.