Open umbertocappellazzo opened 1 year ago
Could you please post the logs after pressing ctrl + C?
By the way, our latest and best-performing recipe is https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer
I suggest you try zipformer instead. You can find the training commands and decoding results at https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/RESULTS.md#zipformer-zipformer--pruned-stateless-transducer--ctc
^CTraceback (most recent call last):
File "/cappellazzo/icefall_forked/icefall/egs/librispeech/ASR/./conformer_ctc/train.py", line 819, in
THis is the logs after keyboard interruption.
Btw, right now I'm interested in conformer_ctc since my colleagues are working with this kind of pipeline outside icefall, so we want to be consistent. Happy to switch to better models in the future, tho
Are there any more logs? Sorry, I cannot figure out what happened from the above logs.
If there are no more logs, I suggest that you use py-spy
to get the call stack and find out where it gets stuck.
py-spy: https://github.com/benfred/py-spy
(Note: We are not using py-spy for profiling. We only need py-spy dump --pid <your_pid>
)
No additional logs unfortunately. I'll try with py-spy and let you know. Btw, are there any requirements in icefall for running ddp? I can check if the server complies with them. Since the server is quite new and nobody has used ddp before, maybe some installations are required.
Since the server is quite new and nobody has used ddp before, maybe some installations are required.
Have you tried to run PyTorch DDP training before without icefall?
Nope, will try with a simple pytorch ddp script then
Hi FangJun, I managed to solve the problem with ddp, basically the A40 GPUs give some problems by default, and a certain command must be pre-appended to make it work. On multiple V100 for example everything works fine.
Apart from this, I'm trying to make the conformer_mmi recipe for librispeech work but I get this error:
tal avg loss: 0.3903, batch size: 13
2023-07-07 08:51:41,608 INFO [train.py:584] (0/3) Epoch 1, batch 1000, batch avg mmi loss 0.4202, batch avg att loss 0.0000, batch avg loss 0.4202, total avg mmiloss: 0.3952, total avg att loss: 0.0000, total avg loss: 0.3952, batch size: 12
2023-07-07 08:51:41,608 INFO [train.py:584] (1/3) Epoch 1, batch 1000, batch avg mmi loss 0.3977, batch avg att loss 0.0000, batch avg loss 0.3977, total avg mmiloss: 0.3912, total avg att loss: 0.0000, total avg loss: 0.3912, batch size: 19
2023-07-07 08:51:41,611 INFO [train.py:584] (2/3) Epoch 1, batch 1000, batch avg mmi loss 0.5049, batch avg att loss 0.0000, batch avg loss 0.5049, total avg mmiloss: 0.3964, total avg att loss: 0.0000, total avg loss: 0.3964, batch size: 14
[I] /home/runner/work/k2/k2/k2/csrc/intersect_dense.cu:314:k2::FsaVec k2::MultiGraphDenseIntersect::FormatOutput(k2::Array1
-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/conda/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/cappellazzo/icefall_forked/icefall/egs/librispeech/ASR/conformer_mmi/train.py", line 821, in run train_one_epoch( File "/cappellazzo/icefall_forked/icefall/egs/librispeech/ASR/conformer_mmi/train.py", line 635, in train_one_epoch compute_validation_loss( File "/cappellazzo/icefall_forked/icefall/egs/librispeech/ASR/conformer_mmi/train.py", line 456, in compute_validation_loss loss, mmi_loss, att_loss = compute_loss( File "/cappellazzo/icefall_forked/icefall/egs/librispeech/ASR/conformer_mmi/train.py", line 409, in compute_loss mmi_loss = loss_fn(dense_fsa_vec=dense_fsa_vec, texts=texts) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "/cappellazzo/icefall_forked/icefall/icefall/mmi.py", line 215, in forward return func( File "/cappellazzo/icefall_forked/icefall/icefall/mmi.py", line 118, in _compute_mmi_loss_exact_non_optimized den_lats = k2.intersect_dense( File "/home/stek/.local/lib/python3.10/site-packages/k2/autograd.py", line 805, in intersect_dense _IntersectDenseFunction.apply(a_fsas, b_fsas, out_fsa, output_beam, File "/home/stek/.local/lib/python3.10/site-packages/k2/autograd.py", line 562, in forward ragged_arc, arc_map_a, arc_map_b = _k2.intersect_dense( ValueError: cannot create std::vector larger than max_size()
I'm using this command:
NCCL_P2P_LEVEL=NVL python3 ./conformer_mmi/train.py --world-size 3 --num-epochs 40 --exp-dir ./conformer_mmi/exp --full-libri False
Any clue on how to solve this issue? I tried to reduce the max-duration and I got the same error (by default is 200).
I talked with Povey here at JSALT2023 and he told me that there should be the possibility to change some parameters about arcs etc.
Thanks!
Also, Dan mentioned that it could be useful to turn off mmi loss during the very first epochs and just using ctc, and then switching to mmi. I remember this is done in other recipes. Could it be a possible solution?
Also, Dan mentioned that it could be useful to turn off mmi loss during the very first epochs and just using ctc, and then switching to mmi. I remember this is done in other recipes. Could it be a possible solution?
Yes, Dan is right.
Please have a look at the command line argument of train.py
--use-pruned-intersect
It you set it to True, it can further reduce RAM usage.
Also, Dan mentioned that it could be useful to turn off mmi loss during the very first epochs and just using ctc, and then switching to mmi. I remember this is done in other recipes. Could it be a possible solution?
Yes, that's is also possible.
CC @yaozengwei . Zengwei has implemented MMI with zipformer. Please try that recipe.
Also, Dan mentioned that it could be useful to turn off mmi loss during the very first epochs and just using ctc, and then switching to mmi. I remember this is done in other recipes. Could it be a possible solution?
Yes, Dan is right.
Please have a look at the command line argument of train.py
--use-pruned-intersect
It you set it to True, it can further reduce RAM usage.
I've already tried it, but the validation loss is very bad and it diverges after some iterations
Also, Dan mentioned that it could be useful to turn off mmi loss during the very first epochs and just using ctc, and then switching to mmi. I remember this is done in other recipes. Could it be a possible solution?
Yes, that's is also possible.
CC @yaozengwei . Zengwei has implemented MMI with zipformer. Please try that recipe.
The problem is that for my experiments with early exit I need to compute the ctc or mmi loss every some intermediate layers. Desh told me that doing smth like that is not possible for zipformer. Also, I saw that the zipformer works with the transducers, in my case like for conformer_ctc I just need the encoder + a linear layer and then computing the ctc/mmi loss. Maybe I can try to compute the ctc loss for the first iterations and then switch to mmi for the conformer_mmi recipe, so adapt that recipe accordingly. But I remember you suggested that I should switch to zipformer recipes. for me it would be fine, providing that I can carry out early exit and dispense with the transducer decoder
Then I recommend trying the second suggestion from Dan. Only applying MMI loss after having trained it with CTC or transducer loss for some batches/epochs.
I agree with @csukuangfj's comment. If you already have a model trained with CTC, you can initialize the encoder from there and continue training with MMI loss.
guys, I noticed dense_intersect has max_states and max_arcs options, that may not be used in the MMI recipe, but it seems to me we could solve this problem in a more general way by using those options-- perhaps they were not available at the time we were working on MMI.
The max_arcs could be set to, for example, 100 million, and max_states to 10 million.
Quick update: the conformer_mmi recipe with a checkpoint model that used ctc loss (5 epochs) seems to work fine now, no strange errors and the curves are reasonable.
guys, I noticed dense_intersect has max_states and max_arcs options, that may not be used in the MMI recipe, but it seems to me we could solve this problem in a more general way by using those options-- perhaps they were not available at the time we were working on MMI.
The max_arcs could be set to, for example, 100 million, and max_states to 10 million.
Yeah I noticed those options, but I was worried if that would blow up GPU memory and we would have to reduce the batch size too much (may be bad for convergence especially since Umberto is training on just 1 GPU, I believe).
guys, I noticed dense_intersect has max_states and max_arcs options, that may not be used in the MMI recipe, but it seems to me we could solve this problem in a more general way by using those options-- perhaps they were not available at the time we were working on MMI. The max_arcs could be set to, for example, 100 million, and max_states to 10 million.
Yeah I noticed those options, but I was worried if that would blow up GPU memory and we would have to reduce the batch size too much (may be bad for convergence especially since Umberto is training on just 1 GPU, I believe).
Now I can use up to 4 (even 6 or 8) A40 GPUs, so I don't have the single-GPU constraint any longer.
Anyways, when I was using mmi loss from the very beginning I noticed that the learning curves were pretty irregular and fluctuating. Now thanks to the ctc warm-up the trend is smooth and regular. I have a hunch that the ctc warm up leads to better results than using mmi from the beginning, even if you fix the the issue of the arcs.
Hi,
I'm trying to run the conformer_ctc recipe for LibriSpeech. If I use a single gpu (i.e., world-size = 1), the recipe does work without any issue.
If I use multiple GPUs, the reciped gets stuck while creating the model:
Basically, the code gets stuck at "about to create model" and it doesn't proceed. Which requirements for running ddp? I'm working with multiple A40 GPUs.
Thank you