Closed varisd closed 3 months ago
I simplified the sharding support in CorpusStep and removed the implicit sharding of the inputs. Only sharded output is now generated before merging into final file.
I also added proper unit testing for corpus_step and sharding-related utils.
I simplified the sharding support in CorpusStep and removed the implicit sharding of the inputs. Only sharded output is now generated before merging into final file.
I also added proper unit testing for corpus_step and sharding-related utils.