In the pipeline, same categories.json is copied for every next step followed by copying all datasets from previous step to current step. However there might be edge cases where:
Downloaded dataset is empty (affects clean(next) step)
Dataset after cleaning becomes empty (affects decontaminate (next) step)
Dataset after decontamination might become empty (affects gather (next) step) --very rare but still a possibility
I think it will be prudent to have a sanity check so that we don't copy empty datasets from previous step to next step. We can remove if dataset is empty check in _cmd_exit_str after this.
In the pipeline, same
categories.json
is copied for every next step followed by copying all datasets from previous step to current step. However there might be edge cases where:clean
(next) step)decontaminate
(next) step)gather
(next) step) --very rare but still a possibilityI think it will be prudent to have a sanity check so that we don't copy empty datasets from previous step to next step. We can remove if dataset is empty check in
_cmd_exit_str
after this.