k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.08k stars 355 forks source link

Errors and Questions regarding exporting a model #998

Closed magacc1 closed 2 months ago

magacc1 commented 2 months ago

I'm looking to re-export this English streaming LSTM model, while making changes like only using the first K encoder layers (not interested in transcription, but just encoder output).

  1. In export-onnx-lstm2.sh and README.md, it points to this repo as containing the original checkpoints prior to export. I downloaded and tried it, but it seems it only contains "epoch-18.pt" in the exp folder, whereas the export params and output correspond to the 99th epoch model not the 18th. Did you upload this while training was still ongoing, or find that the 18th epoch checkpoint was better than the 99th epoch? If the former, is there a place that I can find the 99th epoch checkpoint ? FileNotFoundError: [Errno 2] No such file or directory: '../icefall-asr-librispeech-lstm-transducer-stateless2-2022-09-03/exp/epoch-99.pt' (I tried the ONNX model that I exported from the 18th epoch checkpoint on a few examples, and it seems to do slightly better than the 99th checkpoint ONNX model that's readily available)

  2. Would changing the --num-encoder-layers argument to export-onnx.py export the first K layers ? I notice that its used here which initializes the RNN with lesser layers, but then I'm surprised how PyTorch doesn't throw any error while loading the checkpoint because the checkpoint file would still contain all the 12 layers. Interestingly also when I exported with 9 layers and paired it with the decoder and joiner, the ASR results were actually reasonable, although containing gross deletion errors, i was surprised how it wasn't garbage. Thoughts/pointers?

K=9 ./lstm_transducer_stateless2/export-onnx.py \ --use-averaged-model 0 \ --epoch 99 \ --avg 1 \ --exp-dir $repo/exp \ --num-encoder-layers $K \ --encoder-dim 512 \ --rnn-hidden-size 1024 \ --tokens $repo/data/lang_bpe_500/tokens.txt # --bpe-model $repo/data/lang_bpe_500/bpe.model

csukuangfj commented 2 months ago

t, but it seems it only contains "epoch-18.pt" in the exp folder,

I strongly recommend that you read the icefall doc for exporting model to onnx and then everything should be clear. Here is the link https://k2-fsa.github.io/icefall/model-export/export-onnx.html#download-the-pre-trained-model

For your specific example, you can use

cd exp
ln -s pretrained-iter-468000-avg-16.pt epoch-99.pt

then I'm surprised how PyTorch doesn't throw any error

Please check for the meaning of the argument strict for torch.load_state_dict.