hplt-project / OpusPocus

Marian machine translation training pipeline for thousands of models
2 stars 0 forks source link

Skip processing empty dataset at any stage #12

Open bhavitvyamalik opened 9 months ago

bhavitvyamalik commented 9 months ago

In the pipeline, same categories.json is copied for every next step followed by copying all datasets from previous step to current step. However there might be edge cases where:

I think it will be prudent to have a sanity check so that we don't copy empty datasets from previous step to next step. We can remove if dataset is empty check in _cmd_exit_str after this.