This patch refactors combine_training_corpus.py into a library file that can easily be imported in downstream projects (and other utilities). This also makes unit testing slightly easier (as no special accomodations have to be made for CLI flags). This patch adds in two unittests for combining training corpora as well.
This patch refactors combine_training_corpus.py into a library file that can easily be imported in downstream projects (and other utilities). This also makes unit testing slightly easier (as no special accomodations have to be made for CLI flags). This patch adds in two unittests for combining training corpora as well.