yeyupiaoling / PPASR

基于PaddlePaddle实现端到端中文语音识别,从入门到实战,超简单的入门案例,超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型
Apache License 2.0
792 stars 131 forks source link

训练deepspeech2_big出错gpu内存不足 #99

Closed a00147600 closed 1 year ago

a00147600 commented 1 year ago

这也许是个需要多卡训练或者财力解决的问题 想请教大佬有没有更好的办法。我的显卡是2060 之前执行训练deepspeech2时没有遇到这种情况。 [2022-08-02 00:05:46.810187] Train epoch: [1/65], batch: [2200/2575], loss: 89.97584, learning rate: 0.00005000, eta: 4:04:02 [2022-08-02 00:09:19.068865] Train epoch: [1/65], batch: [2300/2575], loss: 109.15543, learning rate: 0.00005000, eta: 4:32:05 [2022-08-02 00:13:28.208726] Train epoch: [1/65], batch: [2400/2575], loss: 133.09998, learning rate: 0.00005000, eta: 5:15:13 Traceback (most recent call last): File "train.py", line 46, in pretrained_model=args.pretrained_model, File "F:\PPASR-master\ppasr\trainer.py", line 359, in train loss.backward() File "F:\Anaconda3\envs\ppasr\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, *kwargs) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\framework.py", line 434, in impl return func(args, kwargs) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\dygraph\varbase_patch_methods.py", line 292, in backward framework._dygraph_tracer()) OSError: (External) ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 801.000732MB memory on GPU 0, 11.999695GB memory has been allocated and available memory is only 0.000000B.

Please check whether there is any other process using GPU 0.

  1. If yes, please stop them, or start PaddlePaddle on another GPU.
  2. If no, please decrease the batch size of your model.

    (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:87) (at ..\paddle\fluid\imperative\basic_engine.cc:586)

yeyupiaoling commented 1 year ago

使用更小的batch size

a00147600 commented 1 year ago

使用更小的batch size

大佬指的是 train.py的这个参数么 add_arg('batch_size', int, 64, '训练的批量大小')

yeyupiaoling commented 1 year ago

嗯嗯