openspeech-team / openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
https://openspeech-team.github.io/openspeech/
MIT License
670 stars 112 forks source link

Can't run the training examples #185

Closed ccyang123 closed 1 year ago

ccyang123 commented 1 year ago

❓ Questions & Help

I tried to learn how to use openspeech but encountered the following error message. Thank you for your answer.

(1) omegaconf.errors.ConfigAttributeError: Key 'model' is not in struct full_key: model object_type=dict

(2) It shows there are "No such file or directory" for the setttings of dataset.dataset_path and dataset.manifest_file_path. But the directory and file are in the correct position.

dataset.dataset_path="../../../../LibriSpeech/" \ dataset.manifest_file_path="./openspeech/datasets/librispeech/libri_subword_manifest.txt" \

Details

Below are my training script and error message.

===================== training script

python ./openspeech_cli/hydra_train.py \ dataset="librispeech" \ dataset.dataset_download=False \ dataset.dataset_path="../../../../LibriSpeech/" \ dataset.manifest_file_path="./openspeech/datasets/librispeech/libri_subword_manifest.txt" \ tokenizer=libri_subword \ model="conformer_lstm" \ audio=fbank \ lr_scheduler=warmup_reduce_lr_on_plateau \ trainer=gpu \ criterion=cross_entropy

===================error /Desktop/CodeFolder/ASR/openspeech/openspeech/utils.py:88: FutureWarning: Pass y=[ 1.0289366e-05 1.9799588e-06 2.5269967e-06 ... 4.2585389e-06 -7.8615230e-06 -1.8652887e-05] as keyword args. From version 0.10 passing these as positional arguments will result in an error DUMMY_FEATURES = librosa.feature.melspectrogram(DUMMY_SIGNALS, n_mels=80) ./openspeech_cli/hydra_train.py:37: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path=os.path.join("..", "openspeech", "configs"), config_name="train") /home/docker/.local/lib/python3.8/site-packages/hydra/core/default_element.py:124: UserWarning: In 'train': Usage of deprecated keyword in package header '# @package group'. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/changes_to_package_header for more information deprecation_warning( /home/docker/.local/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. ret = run_job( augment: apply_spec_augment: false apply_noise_augment: false apply_joining_augment: false apply_time_stretch_augment: false freq_mask_para: 27 freq_mask_num: 2 time_mask_num: 4 noise_dataset_dir: None noise_level: 0.7 time_stretch_min_rate: 0.7 time_stretch_max_rate: 1.4 dataset: dataset: librispeech dataset_path: ??? dataset_download: false manifest_file_path: ../../../LibriSpeech/libri_subword_manifest.txt trainer: seed: 1 accelerator: dp accumulate_grad_batches: 1 num_workers: 4 batch_size: 32 check_val_every_n_epoch: 1 gradient_clip_val: 5.0 logger: wandb max_epochs: 20 save_checkpoint_n_steps: 10000 auto_scale_batch_size: binsearch sampler: else name: gpu device: gpu use_cuda: true auto_select_gpus: true

Global seed set to 1 [2022-12-23 03:29:47,326][openspeech.utils][INFO] - augment: apply_spec_augment: false apply_noise_augment: false apply_joining_augment: false apply_time_stretch_augment: false freq_mask_para: 27 freq_mask_num: 2 time_mask_num: 4 noise_dataset_dir: None noise_level: 0.7 time_stretch_min_rate: 0.7 time_stretch_max_rate: 1.4 dataset: dataset: librispeech dataset_path: ??? dataset_download: false manifest_file_path: ../../../LibriSpeech/libri_subword_manifest.txt trainer: seed: 1 accelerator: dp accumulate_grad_batches: 1 num_workers: 4 batch_size: 32 check_val_every_n_epoch: 1 gradient_clip_val: 5.0 logger: wandb max_epochs: 20 save_checkpoint_n_steps: 10000 auto_scale_batch_size: binsearch sampler: else name: gpu device: gpu use_cuda: true auto_select_gpus: true

[2022-12-23 03:29:47,373][openspeech.utils][INFO] - Operating System : Linux 5.15.0-52-generic [2022-12-23 03:29:47,374][openspeech.utils][INFO] - Processor : x86_64 [2022-12-23 03:29:47,375][openspeech.utils][INFO] - device : NVIDIA GeForce RTX 3090 [2022-12-23 03:29:47,375][openspeech.utils][INFO] - device : NVIDIA GeForce RTX 3090 [2022-12-23 03:29:47,375][openspeech.utils][INFO] - CUDA is available : True [2022-12-23 03:29:47,375][openspeech.utils][INFO] - CUDA version : 11.3 [2022-12-23 03:29:47,375][openspeech.utils][INFO] - PyTorch version : 1.10.0+cu113 Error executing job with overrides: ['dataset=librispeech', 'dataset.dataset_download=False'] Traceback (most recent call last): File "./openspeech_cli/hydra_train.py", line 42, in hydra_main logger, num_devices = parse_configs(configs) File "/Desktop/CodeFolder/ASR/openspeech/openspeech/utils.py", line 217, in parse_configs project=f"{configs.model.model_name}-{configs.dataset.dataset}", omegaconf.errors.ConfigAttributeError: Key 'model' is not in struct full_key: model object_type=dict

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace. ./asr.sh: line 6: dataset.dataset_path=../../../../LibriSpeech/: No such file or directory ./asr.sh: line 8: dataset.manifest_file_path=./openspeech/datasets/librispeech/libri_subword_manifest.txt: No such file or directory

ccyang123 commented 1 year ago

I solved the problem by setting the correct directory and file name. However, I met another problem as follow:

"You selected an invalid accelerator name: accelerator='dp'. Available names are: cpu, cuda, hpu, ipu, mps, tpu. "

Where can I set another accelerator name? in the command line?

ccyang123 commented 1 year ago

I solved the problem after adding the configuration in my script with ++trainer.accelerator=cuda

ccyang123 commented 1 year ago

However, after the training job ran for 3 hours, it crashed! I am figuring out what's wrong with it? :( Anyone can give me the advice?

ccyang123 commented 1 year ago

The crashed message is as follows: /usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 4 leaked semaphore objects to clean up at shutdown

Anyone can give me the advice?

ccyang123 commented 1 year ago

After using another powerful GPU system, the problem was solved. However, I met another problem during the training job. RuntimeError: The Callback.on_batch_end hook was removed in v1.8. Please use Callback.on_train_batch_end instead.

It seems the python code of "openspeech/callbacks.py" should be modified to "on_train_batch_end" because we can only install the pytorch-lightening 1.8.6 now?

Anyone can give me the advice?

akthddus22 commented 1 year ago

I've got same problem. So I downgraded pytorch-lightning and the problem is solved. But now I am getting segmentation fault during training. I'm not sure it is because of the version of pytorch-lightning...

upskyy commented 1 year ago

@ccyang123 @akthddus22 Hello, I've experimented with versions like below. In particular, pytorch-lightning and hydra when we developed it is very different now, so it's good to match the versions well.

pytorch-lightning       1.4.0
torch                   1.10.2
torchaudio              0.10.2
hydra-core              1.0.7

And the problem of segmentation fault error during training can be temporarily solved as follows. Just add sentencepiece to hydra_train.py But the fundamental solution needs to be looked for a little more. 😭

  import os
  import hydra
+ import sentencepiece
  import pytorch_lightning as pl
  from omegaconf import DictConfig, OmegaConf
  from pytorch_lightning.utilities import rank_zero_info
akthddus22 commented 1 year ago

@upskyy Thanks it works!

ccyang123 commented 1 year ago

@akthddus22 @upskyy
After I downgraded the Pytorch lightning to v1.5, it works now. Thanks for your help!