k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
930 stars 295 forks source link

Out of memory during decoding with pretrained model for LibriSpeech #224

Closed anderleich closed 2 years ago

anderleich commented 2 years ago

Hi,

I've recently started with Icefall and I was just trying to test the pretrained model for LibreSpeech. As stated in the docs I ran the following command:

./tdnn_lstm_ctc/pretrained.py \
  --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \
  --words-file ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/words.txt \
  --HLG ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt \
  --method whole-lattice-rescoring \
  --G ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt \
  --ngram-lm-scale 0.8 \
  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1089-134686-0001.flac \
  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0001.flac \
  ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/test_wavs/1221-135766-0002.flac

However, I get an OOM error during decoding. I have a GPU with 11GB of memory. It seems it happend when loading G:

 2022-02-23 19:35:10,513 INFO [pretrained.py:168] device: cuda:0
2022-02-23 19:35:10,513 INFO [pretrained.py:170] Creating model
2022-02-23 19:35:14,270 INFO [pretrained.py:182] Loading HLG from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lang_phone/HLG.pt
2022-02-23 19:35:23,761 INFO [pretrained.py:190] Loading G from ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/data/lm/G_4_gram.pt
Traceback (most recent call last):
  File "./tdnn_lstm_ctc/pretrained.py", line 277, in <module>
    main()
  File "./tdnn_lstm_ctc/pretrained.py", line 195, in main
    G = k2.add_epsilon_self_loops(G)
  File "venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/k2/fsa_algo.py", line 499, in add_epsilon_self_loops
    ragged_arc, arc_map = _k2.add_epsilon_self_loops(fsa.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 4.73 GiB (GPU 0; 10.91 GiB total capacity; 9.25 GiB already allocated; 468.31 MiB free; 9.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:536 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f455a6b3d62 in venv/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x257de (0x7f459dd697de in venv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x264b2 (0x7f459dd6a4b2 in venv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x268e2 (0x7f459dd6a8e2 in venv/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x46 (0x7f44f125f7f6 in venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/libk2context.so)
frame #5: k2::NewRegion(std::shared_ptr<k2::Context>, unsigned long) + 0x116 (0x7f44f0f86c66 in venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/libk2context.so)
frame #6: k2::Array1<k2::Arc>::Array1(std::shared_ptr<k2::Context>, int, k2::Dtype) + 0xad (0x7f44f0fce01d in venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/libk2context.so)
frame #7: k2::AddEpsilonSelfLoops(k2::Ragged<k2::Arc>&, k2::Ragged<k2::Arc>*, k2::Array1<int>*) + 0x3e9 (0x7f44f0fb3789 in venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/libk2context.so)
frame #8: <unknown function> + 0x87ff4 (0x7f44f22a1ff4 in venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so)
frame #9: <unknown function> + 0x38291 (0x7f44f2252291 in  venv/lib/python3.8/site-packages/k2-1.13.dev20220223+cuda11.3.torch1.10.2-py3.8-linux-x86_64.egg/_k2.cpython-38-x86_64-linux-gnu.so)
<omitting python frames>
frame #21: python3() [0x67eeb1]
frame #22: python3() [0x67ef2f]
frame #23: python3() [0x67efd1]
frame #27: __libc_start_main + 0xf3 (0x7f46239ca0b3 in /lib/x86_64-linux-gnu/libc.so.6)

Any clues about what's happening and how could I solve it?

csukuangfj commented 2 years ago
  --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretraind.pt \

should be

  --checkpoint ./tmp/icefall_asr_librispeech_tdnn-lstm_ctc/exp/pretrained.pt \

pretraind.pt -> pretrained.pt


I just tried the command on our server and find that the peak GPU RAM usage is about 29 GB from the output of watch -n 0.5 nvidia-smi.

So I would recommend you either to switch to a machine with a larger GPU RAM or use a decoding method that does not use the 4-gram G for rescoring.