cuda out of memory when try when run lattice rescoring with transformer pytorchnn?

trangtv57 commented 2 years ago

hi All, I have a problem when running lattice rescoring with transformer follow script: _kaldi/egs/wsj/s5/local/pytorchnn/runnnlm.sh. Script lmrescore_lattice_pytorchnn.sh when computing neural lm rescore at each utterance it's failed because the arc lattice is so large: with utterance like: [29, 1100] ( it's has max length 29, and 1100 arc path). So my cuda memory don't enough size for compute this batch of utterance. So I am trying to reduce arc path, but I don't think it's easy because it relation with other component. So pls can give me an idea how to fix it ? Tks.

danpovey commented 2 years ago

Would need more details, e.g. error messages.

trangtv57 commented 2 years ago

I adding log from file log/compute_sentence_scores.1.log Note: I have change version compute_sentence_scores.py of orgin version to version with compute score by cuda. Some addtional info I print after line https://github.com/kaldi-asr/kaldi/blob/12a2092c887c49fce04360dbf48e43067992e770/egs/wsj/s5/steps/pytorchnn/compute_sentence_scores.py#L192

` data shape: torch.Size([38, 1153]) target shape: torch.Size([43814]) seq lens shape: torch.Size([1153])

_Traceback (most recent call last): File "steps/pytorchnn/compute_sentence_scores_cuda.py", line 345, in main() File "steps/pytorchnn/compute_sentence_scores_cuda.py", line 340, in main model_type=args.model) File "steps/pytorchnn/compute_sentence_scores_cuda.py", line 231, in compute_scores targets, model_type) File "steps/pytorchnn/compute_sentence_scores_cuda.py", line 160, in compute_sentence_score loss = criterion(output.view(-1, ntokens), target) File "/data4/trangtv/miniconda3/envs/trang_nlp/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/data4/trangtv/miniconda3/envs/trang_nlp/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 962, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/data4/trangtv/miniconda3/envs/trang_nlp/lib/python3.7/site-packages/torch/nn/functional.py", line 2468, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/data4/trangtv/miniconda3/envs/trang_nlp/lib/python3.7/site-packages/torch/nn/functional.py", line 1605, in log_softmax ret = input.log_softmax(dim) RuntimeError: CUDA out of memory. Tried to allocate 5.04 GiB (GPU 0; 10.76 GiB total capacity; 5.58 GiB already allocated; 4.26 GiB free; 5.62 GiB reserved in total by PyTorch)`__

As message log error, I think because the utterance have so much arc path: 1183, so my GPU don't enough space for computing this batchsize. So solution I think is reduce number arc path in lattice-expand to limit number. I know we can change by epsilon para. But When I try to change, it's can fix absolutely problem. Tks.

danpovey commented 2 years ago

It looks to me like the sentence length must be 1153. That is a very long sentence; that might be the issue. Maybe you should segment your data into smaller pieces?

trangtv57 commented 2 years ago

I don't think so, I give you a tensor data of 1 sample:

_data shape:  torch.Size([3, 2])
data value:
tensor([[30893, 30893],
[18171, 27414],
[27414,     0]])

target shape:  torch.Size([6])
target value:
tensor([18171, 27414, 27414, 30893, 30893,     0])

seq lens shape:  torch.Size([2])
seq lens value:
tensor([3, 2])_

So I think 1153 is number of arc path (number hypothesis of full lattice), and 38 is max length of 1153 arc. I just print the data with text raw and confirm this idea is right. Pls correct me if i'm get wrong ? tks Dan.

danpovey commented 2 years ago

Oh sorry, you're right. You may have to either change the code to split into batches of a smaller size if num-paths is above some threshold, or put some limit on num-paths at the path-generating stage (there might be an option to whatever Kaldi program was used).

trangtv57 commented 2 years ago

tks @danpovey , with option 1 I can do it, but as best solution in my mind. I think I need limit the num-paths at the path generation stage. But now, I am not familiar with code base in kaldi. Can you suggest what can i change in file kaldi/src/latbin/lattice-expand.cc or something like this. I imagine it only needs minor modification for adding variable limit num-path per utterance. But I don't know where I need change. Thanks

danpovey commented 2 years ago

You'd have to change lattice-path-cover.cc, adding an option like max-paths. You'd have to make sure the paths were sorted from best to worst [actually they do seem to be, there is a std::sort in there], before truncating. There seems to be another problem that needs to be addressed in that program https://github.com/kaldi-asr/kaldi/issues/4719 you also created that issue.

trangtv57 commented 2 years ago

in my issuse #4719, I remove The assertion: KALDI_ASSERT(clat.NumStates() > 1); as you suggest and it work. I understand your idea. I will try some fix and report later. I think when I done fix all issuse, I will make pr later. Tks.

trangtv57 commented 2 years ago

hi @danpovey I have add some code for limit size of paths_and_costs in function ComputePathCover ( https://github.com/kaldi-asr/kaldi/blob/aefbd096ec0c7f1136f669c99be66ac393afe29c/src/latbin/lattice-path-cover.cc#L174) and the problem with size of arcpath has resolve. But, I have another error in file nnlmrescore.1.log. The error log like: And I know it's not because my fix code in lattice-path-cover.cc, because another running experiment before this fix still has this error but because I don't focus to understand, so now I'm just realized the problem. Does you have any suggest. tks

danpovey commented 2 years ago

It's complaining that the key (utterance-id or path-id which utterance-id-N or something like that) is not present in an input archive. That should not really be an assertion, it should be either an error or warning, as it's a problem with the input. You could perhaps change the code to print out a warning and just output the compact lattice unchanged if that happens.

danpovey commented 2 years ago

... but it likely indicates some kind of problem in a previous script, e.g. did not have all the output it should have had.

trangtv57 commented 2 years ago

so, I don't really get where can i start to debug. I run rescore with n-best all is ok? . I will try remove assert to get error if have. I will give you detail of log. Tks.

trangtv57 commented 2 years ago

I attack the error message here: @danpovey

danpovey commented 2 years ago

I think I see the issue. It's not actually OK to limit the number of paths in lattice-path-cover because the rescoring logic relies on all arcs being covered by at least one arc.

It might be necessary to include a 'lattice-limit-depth' pruning command at an earlier stage in the script, i.e. when dumping lattices, to limit the number of paths in the lattice.

trangtv57 commented 2 years ago

but, as i said. My experiment before i have code fix limit numper of path in lattice-path-cover. The experiment with lattice rescoring still exist error. Anyway, I am not sure I can know how to add option lattice-limit-depth like your suggest. Can you give me a fix detailed. Tks

danpovey commented 2 years ago

Sorry I don't have time for such detailed help.

trangtv57 commented 2 years ago

yep, thank you. So can we discuss again about idea lattice limit depth, I understand that I need change some code in lattice-expand. But I don't think this is root cause. Because When I don't change anything with lilmit path. I still get this error assert when running lattice rescore. So I just want you can think more, and give me another idea ?

danpovey commented 2 years ago

Finding the root cause of your problem would require some debugging, looking at files, etc. You need to do that yourself as best you can. the lattice-limit-depth thing would be a script level change, adding a new command in a pipe.

trangtv57 commented 2 years ago

thank you. I will try my self.

francisr commented 2 years ago

@danpovey I've been looking in adding some pruning in the iterative rescore, do you think that lattice-limit-depth is a better fit than lattice-determinize-pruned?
It would make sense, as the number of paths to cover is more tied to the depth than to the posteriors of the arcs.

danpovey commented 2 years ago

If you are trying to avoid OOM, in a later stage lattice-limit-depth will tend to be a better tradeoff I think.

francisr commented 2 years ago

I'm rather trying to reach the same WERs as my previous rescoring method without making batches too big. I'm still a bit off in terms of WERs, despite getting batches of size 10,000 for some segments, which isn't going to be great for RTF.

trangtv57 commented 2 years ago

@rikrd can you share your solution for fix cuda OOM?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

kaldi-asr / kaldi

cuda out of memory when try when run lattice rescoring with transformer pytorchnn? #4734