Open AntonThai2022 opened 3 weeks ago
@yuekaizhang Could you have a look? Thanks!
hello! I build int8 weights: INFERENCE_PRECISION=float16 WEIGHT_ONLY_PRECISION=int8 MAX_BEAM_WIDTH=4 MAX_BATCH_SIZE=8 checkpoint_dir=whisper_large_v3weights${WEIGHT_ONLY_PRECISION} output_dir=whisper_largev3${WEIGHT_ONLY_PRECISION}
Convert the large-v3 model weights into TensorRT-LLM format.
python3 convert_checkpoint.py --use_weight_only --weight_only_precision $WEIGHT_ONLY_PRECISION --output_dir $checkpoint_dir
So I got whisper_large_v3_weights_int8 and put it in sherpa/triton/whisper/model_repo_whisper_trtllm/whisper/1. But it does not work. I tried to change name on whisper_large_v3, but it did not help))) Is it real to launch int8 whisper in your repo and docker image?
@AntonThai2022 Would you mind pasting the error logs here?
I apologize for the created topic - I just mixed up the folder with intermediate weights and engine weights. I sat down and took the desired folder and renamed it - everything worked. It’s strange that the regular version takes 8GB, and 8 bit 7GB. Although the acceleration was almost doubled.
hello! I build int8 weights: INFERENCE_PRECISION=float16 WEIGHT_ONLY_PRECISION=int8 MAX_BEAM_WIDTH=4 MAX_BATCH_SIZE=8 checkpoint_dir=whisper_large_v3weights${WEIGHT_ONLY_PRECISION} output_dir=whisper_largev3${WEIGHT_ONLY_PRECISION}
Convert the large-v3 model weights into TensorRT-LLM format.
python3 convert_checkpoint.py \ --use_weight_only \ --weight_only_precision $WEIGHT_ONLY_PRECISION \ --output_dir $checkpoint_dir
So I got whisper_large_v3_weights_int8 and put it in sherpa/triton/whisper/model_repo_whisper_trtllm/whisper/1. But it does not work. I tried to change name on whisper_large_v3, but it did not help))) Is it real to launch int8 whisper in your repo and docker image?