Closed MinhyukPark closed 1 year ago
This is a good note, marking this as higher priority
Added the following snippet to workflow.py
. Tested as well. Should fix the issue:
# Set network files post cleanup to the cleaned file
post_cleaned = False
for stage in self.stages:
if post_cleaned:
stage.set_network(cleaned_file)
if stage.name == 'cleanup':
post_cleaned = True
Summary
When generating commands using the pipeline.jsons, the cleaned up network is not used consistently in all stages.
For example, the _cleanup.tsv network suffix is used to designate the output of the cleanup stage, a stage which is mandatory due to the assert statement. This network is correctly used in the first clustering stage. However, it is not used in the connectivity_modifier command.
This probably makes sense because the stages are initialized first and then the cleaned_file is fetched but the cleaned_file is not used to re init the stages and in fact it seems to be overridden right away by self.input_file, making the relevant for-loop potentially a no op.
How to replicate the issue
Run the pipeline with this json file. You'll notice that the input file is directly used in the CM stage, and not the cleaned up file.