openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
670 stars 112 forks source link

sp.model 경로 문제 질문드립니다 #211

Closed hbeooooooom closed 11 months ago

hbeooooooom commented 11 months ago

%run ./openspeech_cli/hydra_train.py \ dataset=librispeech \ dataset.dataset_download=True \ dataset.dataset_path=$DATASET_PATH \ dataset.manifest_file_path=$MANIFEST_FILE_PATH \ tokenizer=libri_subword \ model=conformer_lstm \ audio=fbank \ lr_scheduler=warmup_reduce_lr_on_plateau \ trainer=gpu \ criterion=cross_entropy

이렇게 돌렸을 때 데이터셋을 다운하고 [2023-09-24 06:44:17,288][openspeech.datasets.librispeech.lit_data_module][INFO] - Merge all train packs into one [2023-09-24 06:44:19,491][openspeech.datasets.librispeech.lit_data_module][INFO] - Manifest file is not exists !! Generate manifest files.

Error executing job with overrides: ['dataset=librispeech', 'dataset.dataset_download=True', 'dataset.dataset_path=$DATASET_PATH', 'dataset.manifest_file_path=$MANIFEST_FILE_PATH', 'tokenizer=libri_subword', 'model=conformer_lstm', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'criterion=cross_entropy'] Traceback (most recent call last): File "C:\openspeech-main\openspeech-main\openspeech_cli\hydra_train.py", line 45, in hydra_main data_module.prepare_data() File "C:\openspeech-main\openspeech-main\openspeech\datasets\librispeech\lit_data_module.py", line 149, in prepare_data generate_manifest_files( File "C:\openspeech-main\openspeech-main\openspeech\datasets\librispeech\preprocess\subword.py", line 69, in generate_manifest_files shutil.copy(f"{SENTENCEPIECE_MODEL_NAME}.model", os.path.join(vocab_path, f"{SENTENCEPIECE_MODEL_NAME}.model")) File "C:\Users\123\anaconda3\envs\tfreal\lib\shutil.py", line 417, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "C:\Users\123\anaconda3\envs\tfreal\lib\shutil.py", line 256, in copyfile with open(dst, 'wb') as fdst: FileNotFoundError: [Errno 2] No such file or directory: '../../../LibriSpeech/sp.model'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace

해당 에러가 뜹니다. 관련 issues를 찾아보면 현재는 해결되었다와 디렉토리를 1단계 위로 올린다 라고 하셨는데 어떤식으로 하는지 잘 모르겠습니다. sp.model의 경로는 ouput/날짜/시간/sp.model인데 ../../../LibriSpeech/sp.model 이렇게 경로가 되어 있는게 맞는건가요? 모델을 다시 다운받아서 돌려봐도 결과가 같고 코드의 어느 부분을 수정해야 할 지 잘 모르겠어서 질문 남깁니다..