Open martinshkreli opened 8 months ago
Fangjun will get back to you about it, but: hi, martin shkreli! We might need more hardware info and details about what differed between those two runs.
@martinshkreli
Could you describe how you get the int8 models?
Hi guys, thanks again for the wonderful repo. I followed this link to download the model: https://k2-fsa.github.io/sherpa/onnx/tts/pretrained_models/vits.html#download-the-model
Then, I used that file (vits-ljs.int8.onnx) for inference in the python script (offline-tts.py). This was on an 8xA100 instance.
@martinshkreli
Could you describe how you get the int8 models?
hi Fangjun, i just wanted to try and get your attention one more time, sorry if I am being annoying!
The int8 model is obtained via the following code https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e074388e2d7/scripts/vits/export-onnx-ljs.py#L204-L208
Note that it uses https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e074388e2d7/scripts/vits/export-onnx-ljs.py#L207
It is a known issue about onnxruntime that quint8 is slower.
For instance, if you search with google, you can find similar issues:
fangjun, is the int8 intended for different applications or devices then?
On Friday, February 16, 2024, Fangjun Kuang @.***> wrote:
The int8 model is obtained via the following code https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L204-L208
Note that it uses https://github.com/k2-fsa/sherpa-onnx/blob/d7717628689b051b4c9bffd8d43f3e 074388e2d7/scripts/vits/export-onnx-ljs.py#L207
It is a known issue about onnxruntime that quint8 is slower.
For instance, if you search with google, you can find similar issues:
- microsoft/onnxruntime#12854 https://github.com/microsoft/onnxruntime/issues/12854
- microsoft/onnxruntime#6732 https://github.com/microsoft/onnxruntime/issues/6732
— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/sherpa-onnx/issues/575#issuecomment-1948317748, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO24SJC2ZHERFOMYLKDYT5HQDAVCNFSM6AAAAABC45NFDKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYGMYTONZUHA . You are receiving this because you commented.Message ID: @.***>
int8 model mentioned in this issue is about 4x less in file size than that of float32.
If memory matters, then int8 model is preferred.
hi @csukuangfj do you know how to optimize speed of an int8 model? I was experimenting several months ago with it, but i was not able to convert to qint8 and quint8 is really slow on cpu.
You don't need to optimize speed, you need to pick MB-iSTFT VITS model, they are order of magnitude faster than raw VITS with the same quality.
You don't need to optimize speed, you need to pick MB-iSTFT VITS model, they are order of magnitude faster than raw VITS with the same quality.
where can we find these models?
(myenv) ubuntu@152:~/sherpa-onnx/python_api_examples$ python3 test.py Elapsed: 0.080 Saved sentence_0.wav. Elapsed: 0.085 Saved sentence_1.wav. Elapsed: 0.080 Saved sentence_2.wav. Elapsed: 0.074 Saved sentence_3.wav. Elapsed: 0.054 Saved sentence_4.wav. Elapsed: 0.081 Saved sentence_5.wav. Elapsed: 0.067
(myenv) ubuntu@152-69-195-75:~/sherpa-onnx/python_api_examples$ python3 test.py Elapsed: 19.561 Saved sentence_0.wav. Elapsed: 26.432 Saved sentence_1.wav. Elapsed: 27.989 Saved sentence_2.wav. Elapsed: 23.956 Saved sentence_3.wav. Elapsed: 11.361 Saved sentence_4.wav. Elapsed: 27.825 Saved sentence_5.wav. Elapsed: 19.567
any special flag to set to use int8?