openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
678 stars 114 forks source link

Prepare tokenizer of LibriSpeech by using sentencepiece #227

Open tuandattt opened 2 months ago

tuandattt commented 2 months ago

❓ Questions & Help

HI, I can not save model and vocab when using spm.SentencePieceTrainer.Train.

Details

This is my config: python3 ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path=/home/stud_dat/openspeech/openspeech/datasets/librispeech dataset.manifest_file_path=$MANIFEST_FILE_PATH tokenizer=libri_subword model=conformer_lstm audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy

tuandattt commented 2 months ago

And this is the error: trainer_interface.cc(605) LOG(INFO) Saving model: sp.model trainer_interface.cc(616) LOG(INFO) Saving vocabs: sp.vocab Error executing job with overrides: ['dataset=librispeech', 'dataset.dataset_download=False', 'dataset.dataset_path=/home/stud_dat/openspeech/openspeech/datasets/librispeech', 'dataset.manifest_file_path=', 'tokenizer=libri_subword', 'model=conformer_lstm', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'criterion=cross_entropy']