k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

CUDA out of memory #216

Closed shanguanma closed 3 years ago

shanguanma commented 3 years ago

I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your recipe(Librispeech) results, I use the below script:

$cuda_cmd log/stage6_train.log\
   CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_train.py \
                                       --world-size 1\
                                       --full-libri false\
                                       --use-ali-model false \
                                       --num-workers-train 1\
                                       --num-workers-valid 1
$decode_cmd log/stage7_decode.log\
  CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_decode.py

Get result is as follows:

2021-06-15 11:40:13,293 INFO [common.py:398] [test-clean] %WER 5.78% [3037 / 52576, 571 ins, 181 del, 2285 sub ]
2021-06-15 11:49:09,503 INFO [common.py:398] [test-other] %WER 15.14% [7925 / 52343, 1258 ins, 542 del, 6125 sub ]

environment is summary as follows:

[md510@node02 simple_v1]$  python3 -m k2.version
Collecting environment information...

k2 version: 0.3.5
Build type: Release
Git SHA1: 81ad3a580361e20b828d5eb1120999ecd0d7c675
Git date: Sat Jun 5 11:36:50 2021
Cuda used to build k2: 10.2
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 16.04.7 LTS
CMake version: 3.18.4
GCC version: 5.5.0
CMAKE_CUDA_FLAGS:  --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS:  -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 10.2
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False

Now I use other corpus(e.g. seame), at train acoustic model, The program keeps prompting CUDA out of memory Note: GPU is RTX8000(48G per GPU), my running code is as follows:

$cuda_cmd log/stage5_train.log\
CUDA_VISIBLE_DEVICES="2,3,4" python3 ./mmi_att_transformer_train_seame.py \
                                    --world-size 3\
                                    --use-ali-model false \
                                   --num-workers-train 1\
                                   --num-workers-valid 1

error log is as follows:

# CUDA_VISIBLE_DEVICES=2,3,4 python3 ./mmi_att_transformer_train_seame.py --world-size 3 --use-ali-model false --num-workers-train 1 --num-workers-valid 1 
# Invoked at Mon Jun 21 11:13:10 SGT 2021 from node03
# Started at Mon Jun 21 11:14:08 +08 2021 on node02
Traceback (most recent call last):
  File "./mmi_att_transformer_train_seame.py", line 724, in <module>
  File "./mmi_att_transformer_train_seame.py", line 717, in main
    mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 630, in run
    objf, valid_objf, global_batch_idx_train = train_one_epoch(
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 257, in train_one_epoch
    curr_batch_objf, curr_batch_frames, curr_batch_all_frames = get_objf(
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 113, in get_objf
    mmi_loss, tot_frames, all_frames = loss_fn(nnet_output, texts, supervision_segments)
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 222, in forward
    return func(nnet_output=nnet_output,
  File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 97, in _compute_mmi_loss_exact_optimized
    num_den_tot_scores = num_den_lats.get_tot_scores(log_semiring=True,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 644, in get_tot_scores
    tot_scores = k2.autograd._GetTotScoresFunction.apply(
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/autograd.py", line 49, in forward
    tot_scores = fsas._get_tot_scores(use_double_scores=use_double_scores,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 623, in _get_tot_scores
    forward_scores = self._get_forward_scores(use_double_scores,
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 573, in _get_forward_scores
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 513, in _get_entering_arc_batches
  File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 499, in _get_incoming_arcs
    cache[name] = _k2.get_incoming_arcs(self.arcs,
RuntimeError: CUDA out of memory. Tried to allocate 17179869182.18 GiB (GPU 0; 44.49 GiB total capacity; 31.00 GiB already allocated; 7.62 GiB free; 35.77 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1616554788289/work/c10/cuda/CUDACachingAllocator.cpp:288 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab147e12f2 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0x1bc21 (0x2aab1457dc21 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #2: <unknown function> + 0x1c944 (0x2aab1457e944 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x1cf63 (0x2aab1457ef63 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x5e (0x2aab2fe7aade in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #5: k2::NewRegion(std::shared_ptr<k2::Context>, unsigned long) + 0x11e (0x2aab2fbd876e in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #6: <unknown function> + 0x23a61d (0x2aab2fd4661d in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #7: k2::GetTransposeReordering(k2::Ragged<int>&, int) + 0x2ff (0x2aab2fd641ff in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #8: k2::GetIncomingArcs(k2::Ragged<k2::Arc>&, k2::Array1<int> const&) + 0x11a (0x2aab2fc4407a in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)
frame #9: <unknown function> + 0x444ed (0x2aab2eb634ed in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #10: <unknown function> + 0x1bd5f (0x2aab2eb3ad5f in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #11: PyCFunction_Call + 0x54 (0x55555567fdf4 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #12: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #13: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #14: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #15: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #16: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #17: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #18: <unknown function> + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #19: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #20: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #21: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #22: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #23: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #24: PyObject_CallObject + 0x53 (0x55555570dd93 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #25: THPFunction_apply(_object*, _object*) + 0x8fd (0x2aaac76a83fd in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #26: PyCFunction_Call + 0xf9 (0x55555567fe99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #27: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #28: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #29: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #30: <unknown function> + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #31: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #32: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #33: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #34: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #35: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #36: <unknown function> + 0x1b1f91 (0x555555705f91 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #37: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #38: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #39: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #40: _PyObject_FastCallDict + 0x2c1 (0x555555673df1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #41: _PyObject_Call_Prepend + 0x63 (0x55555567e983 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #42: <unknown function> + 0x181b99 (0x5555556d5b99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #43: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #44: _PyEval_EvalFrameDefault + 0x4f2e (0x555555728b4e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #45: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #46: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #47: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #48: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #49: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #50: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #51: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #52: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #53: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #54: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #55: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #56: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #57: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #58: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #59: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #60: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #61: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #62: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)
frame #63: _PyEval_EvalFrameDefault + 0x92f (0x55555572454f in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

# Ended (code 256) at Mon Jun 21 11:19:00 SGT 2021, elapsed time 350 seconds

I don't know where is wrong? Thanks a lot.

danpovey commented 3 years ago

You can mess with the minibatch size, which might help. But finding the source is a good idea too. Are you using an alignment model? (If not, the posteriors at the start can be very flat, which can cause too many states to stay within the pruning beam). What is the size of the phone set?

On Mon, Jun 21, 2021 at 11:39 AM shanguanma @.***> wrote:

I try to use the new snowfall and k2-fsa(0.3.5) to Reproduce your recipe(Librispeech) results, I use the below script:

$cuda_cmd log/stage6_train.log\

CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_train.py \

                                   --world-size 1\

                                   --full-libri false\

                                   --use-ali-model false \

                                   --num-workers-train 1\

                                   --num-workers-valid 1

$decode_cmd log/stage7_decode.log\

CUDA_VISIBLE_DEVICES="4" python3 ./mmi_att_transformer_decode.py

Get result is as follows:

2021-06-15 11:40:13,293 INFO [common.py:398] [test-clean] %WER 5.78% [3037 / 52576, 571 ins, 181 del, 2285 sub ]

2021-06-15 11:49:09,503 INFO [common.py:398] [test-other] %WER 15.14% [7925 / 52343, 1258 ins, 542 del, 6125 sub ]

environment is summary as follows:

@.*** simple_v1]$ python3 -m k2.version

Collecting environment information...

k2 version: 0.3.5

Build type: Release

Git SHA1: 81ad3a580361e20b828d5eb1120999ecd0d7c675

Git date: Sat Jun 5 11:36:50 2021

Cuda used to build k2: 10.2

cuDNN used to build k2: 8.0.2

Python version used to build k2: 3.8

OS used to build k2: Ubuntu 16.04.7 LTS

CMake version: 3.18.4

GCC version: 5.5.0

CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow

CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow

PyTorch version used to build k2: 1.8.1

PyTorch is using Cuda: 10.2

NVTX enabled: True

With CUDA: True

Disable debug: True

Sync kernels : False

Disable checks: False

Now I use other corpus(e.g. seame), at train acoustic model, The program keeps prompting CUDA out of memory Note: GPU is RTX8000(48G per GPU), my running code is as follows:

$cuda_cmd log/stage5_train.log\

CUDA_VISIBLE_DEVICES="2,3,4" python3 ./mmi_att_transformer_train_seame.py \

                                --world-size 3\

                                --use-ali-model false \

                               --num-workers-train 1\

                               --num-workers-valid 1

error log is as follows:

CUDA_VISIBLE_DEVICES=2,3,4 python3 ./mmi_att_transformer_train_seame.py --world-size 3 --use-ali-model false --num-workers-train 1 --num-workers-valid 1

Invoked at Mon Jun 21 11:13:10 SGT 2021 from node03


Started at Mon Jun 21 11:14:08 +08 2021 on node02

Traceback (most recent call last):

File "./mmi_att_transformer_train_seame.py", line 724, in


File "./mmi_att_transformer_train_seame.py", line 717, in main

mp.spawn(run, args=(world_size, args), nprocs=world_size, join=True)

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn

return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes

while not context.join():

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join

raise ProcessRaisedException(msg, error_index, failed_process.pid)


-- Process 0 terminated with the following error:

Traceback (most recent call last):

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap

fn(i, *args)

File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 630, in run

objf, valid_objf, global_batch_idx_train = train_one_epoch(

File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 257, in train_one_epoch

curr_batch_objf, curr_batch_frames, curr_batch_all_frames = get_objf(

File "/home3/md510/w2020/k2_fsa_2021/snowfall/egs/seame/asr/simple_v1/mmi_att_transformer_train_seame.py", line 113, in get_objf

mmi_loss, tot_frames, all_frames = loss_fn(nnet_output, texts, supervision_segments)

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl

result = self.forward(*input, **kwargs)

File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 222, in forward

return func(nnet_output=nnet_output,

File "/home3/md510/w2020/k2_fsa_2021/snowfall/snowfall/objectives/mmi.py", line 97, in _compute_mmi_loss_exact_optimized

num_den_tot_scores = num_den_lats.get_tot_scores(log_semiring=True,

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 644, in get_tot_scores

tot_scores = k2.autograd._GetTotScoresFunction.apply(

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/autograd.py", line 49, in forward

tot_scores = fsas._get_tot_scores(use_double_scores=use_double_scores,

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 623, in _get_tot_scores

forward_scores = self._get_forward_scores(use_double_scores,

File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 573, in _get_forward_scores


File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 513, in _get_entering_arc_batches


File "/home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/k2/fsa.py", line 499, in _get_incoming_arcs

cache[name] = _k2.get_incoming_arcs(self.arcs,

RuntimeError: CUDA out of memory. Tried to allocate 17179869182.18 GiB (GPU 0; 44.49 GiB total capacity; 31.00 GiB already allocated; 7.62 GiB free; 35.77 GiB reserved in total by PyTorch)

Exception raised from malloc at /opt/conda/conda-bld/pytorch_1616554788289/work/c10/cuda/CUDACachingAllocator.cpp:288 (most recent call first):

frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x2aab147e12f2 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10.so)

frame #1: + 0x1bc21 (0x2aab1457dc21 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)

frame #2: + 0x1c944 (0x2aab1457e944 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)

frame #3: + 0x1cf63 (0x2aab1457ef63 in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)

frame #4: k2::PytorchCudaContext::Allocate(unsigned long, void**) + 0x5e (0x2aab2fe7aade in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)

frame #5: k2::NewRegion(std::shared_ptr, unsigned long) + 0x11e (0x2aab2fbd876e in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)

frame #6: + 0x23a61d (0x2aab2fd4661d in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)

frame #7: k2::GetTransposeReordering(k2::Ragged&, int) + 0x2ff (0x2aab2fd641ff in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)

frame #8: k2::GetIncomingArcs(k2::Ragged&, k2::Array1 const&) + 0x11a (0x2aab2fc4407a in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/libk2context.so)

frame #9: + 0x444ed (0x2aab2eb634ed in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)

frame #10: + 0x1bd5f (0x2aab2eb3ad5f in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)

frame #11: PyCFunction_Call + 0x54 (0x55555567fdf4 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #12: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #13: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #14: + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #15: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #16: + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #17: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #18: + 0x1b1e86 (0x555555705e86 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #19: _PyEval_EvalFrameDefault + 0x4ca3 (0x5555557288c3 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #20: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #21: + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #22: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #23: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #24: PyObject_CallObject + 0x53 (0x55555570dd93 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #25: THPFunction_apply(_object, _object) + 0x8fd (0x2aaac76a83fd in /home3/md510/anaconda3/envs/foo_k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

frame #26: PyCFunction_Call + 0xf9 (0x55555567fe99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #27: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #28: _PyEval_EvalFrameDefault + 0x534b (0x555555728f6b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #29: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #30: + 0x1b2007 (0x555555706007 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #31: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #32: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #33: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #34: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #35: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #36: + 0x1b1f91 (0x555555705f91 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #37: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #38: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #39: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #40: _PyObject_FastCallDict + 0x2c1 (0x555555673df1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #41: _PyObject_Call_Prepend + 0x63 (0x55555567e983 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #42: + 0x181b99 (0x5555556d5b99 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #43: _PyObject_MakeTpCall + 0x31e (0x55555568ef2e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #44: _PyEval_EvalFrameDefault + 0x4f2e (0x555555728b4e in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #45: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #46: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #47: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #48: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #49: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #50: _PyEval_EvalFrameDefault + 0x1782 (0x5555557253a2 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #51: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #52: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #53: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #54: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #55: PyObject_Call + 0x5e (0x5555556790be in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #56: _PyEval_EvalFrameDefault + 0x21c1 (0x555555725de1 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #57: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #58: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #59: _PyEval_EvalCodeWithName + 0x2c3 (0x555555704503 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #60: _PyFunction_Vectorcall + 0x378 (0x5555557058d8 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #61: _PyEval_EvalFrameDefault + 0xa4b (0x55555572466b in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #62: _PyFunction_Vectorcall + 0x1a6 (0x555555705706 in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

frame #63: _PyEval_EvalFrameDefault + 0x92f (0x55555572454f in /home3/md510/anaconda3/envs/foo_k2/bin/python3)

Ended (code 256) at Mon Jun 21 11:19:00 SGT 2021, elapsed time 350 seconds

I don't know where is wrong? Thanks a lot.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/k2-fsa/snowfall/issues/216, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3F7D6VJLNGBSUOU3TTT2X6RANCNFSM47AX6H6A .

csukuangfj commented 3 years ago

@shanguanma Are you using the dataset from https://github.com/lhotse-speech/lhotse/issues/320 ? I notice that a single wav in that dataset can be more than 19 minutes long, which is too long I think.


Can you run print(feature.shape) for LibriSpeech and your own dataset? If yours is way larger than that of LibriSpeech, then that is the reason for OOM.

shanguanma commented 3 years ago


Are you using an alignment model? (If not, the posteriors at the start can be very flat, which can cause too many states to stay within the pruning beam).

I haven't used the alignment model

What is the size of the phone set?

[md510@node02 simple_v1]$ ls data/lang_nosp/L.fst.txt  -larth
-rw-r--r-- 1 md510 users 5.9M Jun 15 18:10 data/lang_nosp/L.fst.txt
[md510@node02 simple_v1]$ wc -l  data/lang_nosp/phones.txt 
278 data/lang_nosp/phones.txt

You can mess with the minibatch size, which might help. But finding the the source is a good idea too.

I will try to reduce the minibatch size.

shanguanma commented 3 years ago


Are you using the dataset from lhotse-speech/lhotse#320 ? I notice that a single wav in that dataset can be more than 19 minutes long, which is too long I think.

Yes, without segment , the whole utterance is very log. but I have segment file and I used wav.scp and segment file to get segment utterance (it is about 2s~10s one utterance). I don't think where is wrong.

Can you run print(feature.shape) for LibriSpeech and your own dataset? If yours is way larger than that of LibriSpeech, then that is the reason for OOM.

Now, I am doing it. the shape is as follows:

In the seame data:

# Started at Mon Jun 21 13:53:50 +08 2021 on node02
feature shape is torch.Size([34, 1974, 80])
feature shape is torch.Size([37, 1819, 80])
feature shape is torch.Size([14, 4630, 80])
feature shape is torch.Size([38, 1710, 80])
feature shape is torch.Size([18, 3467, 80])
feature shape is torch.Size([32, 2076, 80])
feature shape is torch.Size([22, 2972, 80])
feature shape is torch.Size([33, 2036, 80])
feature shape is torch.Size([35, 1882, 80])
feature shape is torch.Size([43, 1571, 80])
feature shape is torch.Size([32, 2044, 80])
feature shape is torch.Size([28, 2346, 80])
feature shape is torch.Size([38, 1714, 80])
feature shape is torch.Size([25, 2617, 80])
feature shape is torch.Size([21, 3069, 80])
feature shape is torch.Size([17, 3971, 80])
feature shape is torch.Size([29, 2269, 80])
feature shape is torch.Size([31, 2123, 80])
feature shape is torch.Size([29, 2213, 80])
feature shape is torch.Size([36, 1844, 80])
feature shape is torch.Size([36, 1816, 80])
feature shape is torch.Size([32, 2086, 80])
feature shape is torch.Size([33, 1948, 80])
feature shape is torch.Size([33, 1975, 80])
feature shape is torch.Size([20, 3234, 80])
feature shape is torch.Size([41, 1628, 80])
feature shape is torch.Size([25, 2655, 80])
feature shape is torch.Size([40, 1636, 80])
feature shape is torch.Size([22, 2877, 80])
feature shape is torch.Size([26, 2491, 80])
feature shape is torch.Size([40, 1658, 80])
feature shape is torch.Size([26, 2504, 80])
feature shape is torch.Size([24, 2664, 80])
feature shape is torch.Size([43, 1552, 80])
feature shape is torch.Size([29, 2275, 80])
feature shape is torch.Size([24, 2755, 80])
feature shape is torch.Size([39, 1644, 80])
feature shape is torch.Size([21, 3057, 80])
feature shape is torch.Size([31, 2100, 80])
feature shape is torch.Size([40, 1645, 80])
feature shape is torch.Size([30, 2255, 80])
feature shape is torch.Size([37, 1780, 80])
feature shape is torch.Size([22, 2921, 80])
feature shape is torch.Size([39, 1701, 80])
feature shape is torch.Size([33, 1969, 80])
feature shape is torch.Size([33, 1988, 80])
feature shape is torch.Size([32, 2089, 80])
feature shape is torch.Size([53, 1260, 80])
feature shape is torch.Size([32, 2084, 80])
feature shape is torch.Size([38, 1712, 80])
feature shape is torch.Size([28, 2370, 80])
feature shape is torch.Size([23, 2870, 80])
feature shape is torch.Size([50, 1377, 80])
feature shape is torch.Size([31, 2108, 80])
feature shape is torch.Size([25, 2652, 80])
feature shape is torch.Size([50, 1294, 80])
feature shape is torch.Size([48, 1414, 80])
feature shape is torch.Size([28, 2331, 80])
feature shape is torch.Size([38, 1817, 80])
feature shape is torch.Size([23, 2784, 80])
feature shape is torch.Size([40, 1621, 80])
feature shape is torch.Size([40, 1695, 80])
feature shape is torch.Size([36, 1914, 80])
feature shape is torch.Size([39, 1649, 80])
feature shape is torch.Size([39, 1671, 80])
feature shape is torch.Size([39, 1741, 80])
feature shape is torch.Size([35, 1895, 80])
feature shape is torch.Size([40, 1591, 80])
feature shape is torch.Size([39, 1661, 80])
feature shape is torch.Size([34, 1859, 80])
feature shape is torch.Size([34, 1960, 80])
feature shape is torch.Size([41, 1599, 80])
feature shape is torch.Size([37, 1875, 80])
feature shape is torch.Size([40, 1659, 80])
feature shape is torch.Size([34, 1925, 80])
feature shape is torch.Size([43, 1533, 80])
feature shape is torch.Size([37, 1831, 80])
feature shape is torch.Size([27, 2454, 80])

In the librispeech data:

in this librispeech , feature shape is torch.Size([34, 1771, 80])
in this librispeech , feature shape is torch.Size([35, 1779, 80])
in this librispeech , feature shape is torch.Size([35, 1698, 80])
in this librispeech , feature shape is torch.Size([34, 1736, 80])
in this librispeech , feature shape is torch.Size([33, 1807, 80])
in this librispeech , feature shape is torch.Size([35, 1754, 80])
in this librispeech , feature shape is torch.Size([34, 1798, 80])
in this librispeech , feature shape is torch.Size([34, 1791, 80])
in this librispeech , feature shape is torch.Size([36, 1748, 80])
in this librispeech , feature shape is torch.Size([33, 1771, 80])
in this librispeech , feature shape is torch.Size([35, 1757, 80])
in this librispeech , feature shape is torch.Size([35, 1771, 80])
in this librispeech , feature shape is torch.Size([32, 1828, 80])
in this librispeech , feature shape is torch.Size([34, 1723, 80])
in this librispeech , feature shape is torch.Size([35, 1744, 80])
in this librispeech , feature shape is torch.Size([33, 1847, 80])
in this librispeech , feature shape is torch.Size([35, 1847, 80])
in this librispeech , feature shape is torch.Size([34, 1722, 80])
in this librispeech , feature shape is torch.Size([33, 1866, 80])
in this librispeech , feature shape is torch.Size([35, 1672, 80])
in this librispeech , feature shape is torch.Size([33, 1808, 80])
in this librispeech , feature shape is torch.Size([33, 1805, 80])
in this librispeech , feature shape is torch.Size([33, 1817, 80])
in this librispeech , feature shape is torch.Size([36, 1662, 80])
in this librispeech , feature shape is torch.Size([35, 1724, 80])
in this librispeech , feature shape is torch.Size([33, 1727, 80])
in this librispeech , feature shape is torch.Size([35, 1797, 80])
in this librispeech , feature shape is torch.Size([32, 1876, 80])
in this librispeech , feature shape is torch.Size([34, 1731, 80])
in this librispeech , feature shape is torch.Size([34, 1839, 80])
in this librispeech , feature shape is torch.Size([31, 1812, 80])
in this librispeech , feature shape is torch.Size([21, 2857, 80])
in this librispeech , feature shape is torch.Size([19, 3265, 80])
in this librispeech , feature shape is torch.Size([28, 2162, 80])
in this librispeech , feature shape is torch.Size([22, 2712, 80])
in this librispeech , feature shape is torch.Size([22, 2950, 80])
in this librispeech , feature shape is torch.Size([30, 2082, 80])
in this librispeech , feature shape is torch.Size([31, 2022, 80])
in this librispeech , feature shape is torch.Size([24, 2462, 80])
in this librispeech , feature shape is torch.Size([28, 2088, 80])
in this librispeech , feature shape is torch.Size([37, 1770, 80])
in this librispeech , feature shape is torch.Size([27, 2299, 80])
in this librispeech , feature shape is torch.Size([29, 2118, 80])
in this librispeech , feature shape is torch.Size([18, 1450, 80])

I did not find the star difference between the above two.

now I reduced minibatch, --max-duration from 500 -> 100 , it can run it without CUDA out of memory.