lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
932 stars 213 forks source link

Re-structure the corpus preparation module #110

Open pzelasko opened 3 years ago

pzelasko commented 3 years ago

Currently, the corpora are documented by a top-level docstring in each lhotse.recipes module. For better discoverability to the users, we should attach that description to the docstrings of prepare_X functions and lhotse prepare X CLI help messages. We could achieve that with a similar approach to what the transformers library does here.

pzelasko commented 3 years ago

We should also make each corpus explicitly declare the list of created manifests (e.g. with a module-level constant or function) to help implement a consistent "caching" mechanism for the prepare_X functions (see e.g. what #133 does).