naobservatory / mgs-workflow

3 stars 2 forks source link

Fixing numbering of workflow steps (or remove numbering) #15

Closed mikemc closed 1 week ago

mikemc commented 1 month ago

Currently we have two step 8's in main.nf ---

/**************************************************
| 8. TAXONOMIC ASSIGNMENT WITH KRAKEN - PRE-DEDUP |
**************************************************/
...
/**********************************
| 8. COLLATE AND PROCESS RESULTS |
**********************************/

Any reason to not just call the second one step 9?


This could also be a good point to consider how steps are labeled in main.nf. I think having numbers to refer to the big sections is quite useful for structuring how we document the pipeline in the README, and more generally having a way to refer to where in the general workflow something happens. But using numbers is a bit brittle/burdensome when we want to add a NF workflow or NF process, since we might then have to update all following numbers (I'm guessing that's what happened in this case).

I wonder if instead it might ultimately be better to just drop the numbers, and use the NF workflow and process names (rather than {Workflow#} for the section and {Workflow#}.{Process#} for the processes), since (I think) these names are already unique identifiers. So,

/**************************************************
| workflow CLASSIFY_READS_SUBSET - TAXONOMIC ASSIGNMENT WITH KRAKEN - PRE-DEDUP |
**************************************************/
...
/**********************************
| workflow PROCESS_OUTPUT - COLLATE AND PROCESS RESULTS |
**********************************/

(In this case I'd recommend not using capitalization for the description after the "-" to make it more distinguishable from the workflow name; the *'s already clearly mark this as a section header

willbradshaw commented 1 month ago

I think the numbers (especially process numbers) are quite helpful for finding the right section of the document, but they sure are annoying to keep updated.

I expect we'll probably split out the big workflow file into multiple separate subworkflow files sometime soon, which will obviate the need for numbers entirely. In the meantime, I'll go fix the numbers.

mikemc commented 1 month ago

I expect we'll probably split out the big workflow file into multiple separate subworkflow files sometime soon, which will obviate the need for numbers entirely. In the meantime, I'll go fix the numbers.

Yea, I was thinking this could be the way to go.

Am I right that processes are in principle independent of a workflow --- e.g. the same process could be used in multiple workflows?

willbradshaw commented 2 weeks ago

Am I right that processes are in principle independent of a workflow --- e.g. the same process could be used in multiple workflows?

This seems to be true sometimes but not other times. I want to get a better handle on this.

willbradshaw commented 1 week ago

The original request is now rendered moot by the module/subworkflow structure now implemented in the dev branch.

@mikemc:

Am I right that processes are in principle independent of a workflow --- e.g. the same process could be used in multiple workflows?

This is basically correct given a module structure like that implemented in the dev branch. The main caveat is that, rather than calling the same module multiple times in a workflow, you need to import in multiple times with different aliases and then call each alias exactly once. But that isn't a huge amount of additional overhead.