damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch模型使用GPU推理时GPU内存溢出

OS: Windows 10 Python/C++ Version：python 3.8.17 Package Version：pytorch==1.11.0、torchaudio==0.11.0、modelscope==1.8.1、funasr==0.7.3（pip list） Model：damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch Command：pipeline(audio_in='超过一个小时的音频') Details：使用GPU推理时，GPU内存溢出，GPU剩余内存6G，想问下可以控制GPU占用内存大小么，用GPU推理短音频时是正常的 Error log：

E:\Anaconda3\envs\srv2\lib\site-packages\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work

warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

2023-08-15 18:14:31,983 - modelscope - INFO - PyTorch version 1.11.0+cu113 Found.

2023-08-15 18:14:31,986 - modelscope - INFO - TensorFlow version 2.13.0 Found.

2023-08-15 18:14:31,986 - modelscope - INFO - Loading ast index from C:\Users\演示机.cache\modelscope\ast_indexer

2023-08-15 18:14:32,200 - modelscope - INFO - Loading done! Current index file version is 1.8.1, with md5 65607a88c407898863dccee442cbfc94 and a total number of 893 components indexed

Namespace(audio_file='D:\SRV2\庭审材料2.wav', frame_rate=16000, gpu=1)

2023-08-15 18:14:34,712 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.4

2023-08-15 18:14:34,996 - modelscope - INFO - initiate model from C:\Users\演示机.cache\modelscope\hub\damo\speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

2023-08-15 18:14:34,996 - modelscope - INFO - initiate model from location C:\Users\演示机.cache\modelscope\hub\damo\speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch.

2023-08-15 18:14:35,000 - modelscope - INFO - initialize model from C:\Users\演示机.cache\modelscope\hub\damo\speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch

2023-08-15 18:14:35,012 - modelscope - WARNING - No preprocessor field found in cfg.

2023-08-15 18:14:35,012 - modelscope - WARNING - No val key and type key found in preprocessor domain of configuration.json file.

2023-08-15 18:14:35,012 - modelscope - WARNING - Cannot find available config to build preprocessor at mode inference, current config: {'model_dir': 'C:\Users\演示机\.cache\modelscope\hub\damo\speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch'}. trying to build by task and model information.

2023-08-15 18:14:35,013 - modelscope - WARNING - No preprocessor key ('generic-asr', 'auto-speech-recognition') found in PREPROCESSOR_MAP, skip building preprocessor.

2023-08-15 18:14:35,733 - modelscope - INFO - Model revision not specified, use the latest revision: v1.2.0

2023-08-15 18:14:35,940 - modelscope - INFO - loading vad model from C:\Users\演示机.cache\modelscope\hub\damo\speech_fsmn_vad_zh-cn-16k-common-pytorch ...

2023-08-15 18:14:36,110 - modelscope - INFO - Model revision not specified, use the latest revision: v1.1.7

2023-08-15 18:14:36,339 - modelscope - INFO - loading punctuation model from C:\Users\演示机.cache\modelscope\hub\damo\punc_ct-transformer_zh-cn-common-vocab272727-pytorch ...

################################################################################

WARNING, path does not exist: KALDI_ROOT=/mnt/matylda5/iveselyk/Tools/kaldi-trunk

(please add 'export KALDI_ROOT=' in your $HOME/.profile)

(or run as: KALDI_ROOT= python .py)

################################################################################

2023-08-15 18:15:02,142 - modelscope - INFO - Decoding with wav files ...

batch_size_token: 6000

time cost vad: 56.4520480632782

batch: 88

time cost asr: 2.9561240673065186

batch: 39

time cost asr: 1.0192680358886719

batch: 29

time cost asr: 0.9634253978729248

batch: 24

time cost asr: 1.0352020263671875

batch: 21

time cost asr: 1.0262844562530518

batch: 18

time cost asr: 1.07608962059021

batch: 16

time cost asr: 1.137986660003662

batch: 14

time cost asr: 1.2157480716705322

batch: 12

Traceback (most recent call last):

File "srv2_offline.py", line 70, in

result = convert_voice_to_text()

File "srv2_offline.py", line 56, in convert_voice_to_text

result = p(audio_in=args.audio_file, audio_fs=sample_rate)

File "E:\Anaconda3\envs\srv2\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 256, in call

output = self.forward(output, **kwargs)

File "E:\Anaconda3\envs\srv2\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 505, in forward

inputs['asr_result'] = self.run_inference(self.cmd, **kwargs)

File "E:\Anaconda3\envs\srv2\lib\site-packages\modelscope\pipelines\audio\asr_inference_pipeline.py", line 580, in run_inference

asr_result = self.funasr_infer_modelscope(cmd['name_and_type'],

File "d:\srv2\funasr\funasr\bin\asr_inference_launch.py", line 660, in _forward

results = speech2text(**batch)

File "E:\Anaconda3\envs\srv2\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context

return func(*args, **kwargs)

File "d:\srv2\funasr\funasr\bin\asr_infer.py", line 438, in call

predictor_outs = self.asr_model.calc_predictor(enc, enc_len)

File "d:\srv2\funasr\funasr\models\e2e_asr_paraformer.py", line 1637, in calc_predictor

pre_acoustic_embeds, pre_token_length, alphas, pre_peak_index, pre_token_length2 = self.predictor(encoder_out,

File "E:\Anaconda3\envs\srv2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl

return forward_call(*input, **kwargs)

File "d:\srv2\funasr\funasr\models\predictor\cif.py", line 593, in forward

output2, (_, _) = self.blstm(output2)

File "E:\Anaconda3\envs\srv2\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl

return forward_call(*input, **kwargs)

File "E:\Anaconda3\envs\srv2\lib\site-packages\torch\nn\modules\rnn.py", line 761, in forward

result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,

RuntimeError: CUDA out of memory. Tried to allocate 446.00 MiB (GPU 0; 6.00 GiB total capacity; 3.97 GiB already allocated; 0 bytes free; 4.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

modelscope / FunASR