Closed SSwethaSel0609 closed 1 year ago
Please show your complete command.
The error indicates you use a different set of model arguments for decode.py and train.py. Please check the commandline arguments carefully.
Yeah I resolved that error.. I'm getting very high word error rate.. what should I do to reduce
Are you using your own model or use our pre-trained model? If you are using your own model, is your model converged?
No I'm using my own model.. I'm training the model again.. it was running before after i changed the gpu.. it is showing error like
File "/usr/lib/python3.8/shutil.py", line 675, in _rmtree_safe_fd
onerror(os.unlink, fullname, sys.exc_info())
File "/usr/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
os.unlink(entry.name, dir_fd=topfd)
OSError: [Errno 16] Device or resource busy: '.nfsf8aa902a3d86d53e00000bf6'
Traceback (most recent call last):
File "./pruned_transducer_stateless7/train.py", line 1275, in
-- Process 1 terminated with the following error: Traceback (most recent call last): File "/mnt/efs/swetha/icefall_env_swe/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/mnt/efs/swetha/marathi/ds-icefall-scripts/pruned_transducer_stateless7/train.py", line 1145, in run train_one_epoch( File "/mnt/efs/swetha/marathi/ds-icefall-scripts/pruned_transducer_stateless7/train.py", line 940, in train_one_epoch valid_info = compute_validation_loss( File "/mnt/efs/swetha/marathi/ds-icefall-scripts/pruned_transducer_stateless7/train.py", line 753, in compute_validation_loss loss, loss_info = compute_loss( File "/mnt/efs/swetha/marathi/ds-icefall-scripts/pruned_transducer_stateless7/train.py", line 704, in compute_loss raise ValueError( ValueError: There are too many utterances in this batch leading to inf or nan losses.
Duplicate of https://github.com/k2-fsa/icefall/issues/1289
2023-10-02 04:44:42,617 INFO [zipformer.py:178] At encoder stack 4, which has downsampling_factor=2, we will combine the outputs of layers 1 and 3, with downsampling_factors=2 and 8. 2023-10-02 04:44:42,626 INFO [decode.py:917] Calculating the averaged model over epoch range from 5 (excluded) to 30 Traceback (most recent call last): File "./pruned_transducer_stateless7/decode.py", line 1015, in
main()
File "/mnt/efs/swetha/icefall_env_swe/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "./pruned_transducer_stateless7/decode.py", line 922, in main
model.load_state_dict(
File "/mnt/efs/swetha/icefall_env_swe/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Transducer:
size mismatch for decoder.embedding.weight: copying a param with shape torch.Size([500, 512]) from checkpoint, the shape in current model is torch.Size([250, 512]).
size mismatch for joiner.output_linear.weight: copying a param with shape torch.Size([500, 512]) from checkpoint, the shape in current model is torch.Size([250, 512]).
size mismatch for joiner.output_linear.bias: copying a param with shape torch.Size([500]) from checkpoint, the shape in current model is torch.Size([250]).
size mismatch for simple_am_proj.weight: copying a param with shape torch.Size([500, 384]) from checkpoint, the shape in current model is torch.Size([250, 384]).
size mismatch for simple_am_proj.bias: copying a param with shape torch.Size([500]) from checkpoint, the shape in current model is torch.Size([250]).
size mismatch for simple_lm_proj.weight: copying a param with shape torch.Size([500, 512]) from checkpoint, the shape in current model is torch.Size([250, 512]).
size mismatch for simple_lm_proj.bias: copying a param with shape torch.Size([500]) from checkpoint, the shape in current model is torch.Size([250]).