Open danpovey opened 2 years ago
It's because the CUDA versions used by k2(10.1) and PyTorch(11.0) are different. You can either try:
pip install torch==1.7.1+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
with pip install torch==1.7.1+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
orpip install k2
with pip install k2==1.19.dev20220831+cuda11.0.torch1.7.1 -f http://k2-fsa.org/nightly/ --trusted-host k2-fsa.org
thanks!
(was my wife's question. closing the issue).
CTC decoding stage wants me to install lhotse: ModuleNotFoundError: No module named 'lhotse'
! pip install lhotse
and then got the following error:command:
! cd icefall/egs/librispeech/ASR && \
PYTHONPATH=/content/icefall python3 ./conformer_ctc/pretrained.py \
--method ctc-decoding \
--checkpoint ./tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
--lang-dir ./tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac \
./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac
output:
/usr/local/lib/python3.7/dist-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
'"sox" backend is being deprecated. '
usage: pretrained.py [-h] --checkpoint CHECKPOINT [--words-file WORDS_FILE]
[--HLG HLG] [--bpe-model BPE_MODEL] [--method METHOD]
[--G G] [--num-paths NUM_PATHS]
[--ngram-lm-scale NGRAM_LM_SCALE]
[--attention-decoder-scale ATTENTION_DECODER_SCALE]
[--nbest-scale NBEST_SCALE] [--sos-id SOS_ID]
[--num-classes NUM_CLASSES] [--eos-id EOS_ID]
sound_files [sound_files ...]
pretrained.py: error: unrecognized arguments: --lang-dir
--lang-dir
to --bpe-model
After above 2 fixes I am getting this error now:
/usr/local/lib/python3.7/dist-packages/torchaudio/backend/utils.py:54: UserWarning: "sox" backend is being deprecated. The default backend will be changed to "sox_io" backend in 0.8.0 and "sox" backend will be removed in 0.9.0. Please migrate to "sox_io" backend. Please refer to https://github.com/pytorch/audio/issues/903 for the detail.
'"sox" backend is being deprecated. '
2022-09-09 22:55:53,500 INFO [pretrained.py:259] {'sample_rate': 16000, 'subsampling_factor': 4, 'vgg_frontend': False, 'use_feat_batchnorm': True, 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'num_decoder_layers': 0, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'checkpoint': './tmp/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt', 'words_file': None, 'HLG': None, 'bpe_model': './tmp/icefall_asr_librispeech_conformer_ctc/data/lang_bpe', 'method': 'ctc-decoding', 'G': None, 'num_paths': 100, 'ngram_lm_scale': 1.3, 'attention_decoder_scale': 1.2, 'nbest_scale': 0.5, 'sos_id': 1, 'num_classes': 500, 'eos_id': 1, 'sound_files': ['./tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1089-134686-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0001.flac', './tmp/icefall_asr_librispeech_conformer_ctc/test_wavs/1221-135766-0002.flac']}
2022-09-09 22:55:53,533 INFO [pretrained.py:265] device: cuda:0
2022-09-09 22:55:53,533 INFO [pretrained.py:267] Creating model
Traceback (most recent call last):
File "./conformer_ctc/pretrained.py", line 435, in <module>
main()
File "./conformer_ctc/pretrained.py", line 280, in main
model.load_state_dict(checkpoint["model"], strict=False)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Conformer:
size mismatch for encoder_output_layer.1.weight: copying a param with shape torch.Size([5000, 512]) from checkpoint, the shape in current model is torch.Size([500, 512]).
size mismatch for encoder_output_layer.1.bias: copying a param with shape torch.Size([5000]) from checkpoint, the shape in current model is torch.Size([500]).
All decoding scripts are throwing "size mismatch" error
For the size mismatch error, which pretrained model are you using?
size mismatch for encoder_output_layer.1.bias: copying a param with shape torch.Size([5000]) from checkpoint, the shape in current model is torch.Size([500]).
It shows that your pretrained model has a vocab size of 500.
Changed --lang-dir to --bpe-model
Please clarify which bpe.model you are using. If you use a pre-trained model of vocab size 500, please use
bpe.model from data/lang_bpe_500
or use the bpe.model
that you downloaded from hugging face
when you downloaded the pretrained model.
I just updated the colab notebook at https://colab.research.google.com/drive/1huyupXAcHsUrKaWfI83iMEJ6J0Nh0213?usp=sharing
@npovey Could you try it again?
The colab notebook was quite outdated.
@csukuangfj It all works. Thanks!
A problem installing k2 in colab: https://colab.research.google.com/drive/15FSAIx7dND2xcZW9ZOZmffhbgmIH2zNS?usp=sharing