训练deepspeech2_big出错gpu内存不足

a00147600 commented 1 year ago

这也许是个需要多卡训练或者财力解决的问题想请教大佬有没有更好的办法。我的显卡是2060 之前执行训练deepspeech2时没有遇到这种情况。 [2022-08-02 00:05:46.810187] Train epoch: [1/65], batch: [2200/2575], loss: 89.97584, learning rate: 0.00005000, eta: 4:04:02 [2022-08-02 00:09:19.068865] Train epoch: [1/65], batch: [2300/2575], loss: 109.15543, learning rate: 0.00005000, eta: 4:32:05 [2022-08-02 00:13:28.208726] Train epoch: [1/65], batch: [2400/2575], loss: 133.09998, learning rate: 0.00005000, eta: 5:15:13 Traceback (most recent call last): File "train.py", line 46, in pretrained_model=args.pretrained_model, File "F:\PPASR-master\ppasr\trainer.py", line 359, in train loss.backward() File "F:\Anaconda3\envs\ppasr\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\wrapped_decorator.py", line 25, in impl return wrapped_func(*args, *kwargs) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\framework.py", line 434, in impl return func(args, kwargs) File "F:\Anaconda3\envs\ppasr\lib\site-packages\paddle\fluid\dygraph\varbase_patch_methods.py", line 292, in backward framework._dygraph_tracer()) OSError: (External) ResourceExhaustedError:

Out of memory error on GPU 0. Cannot allocate 801.000732MB memory on GPU 0, 11.999695GB memory has been allocated and available memory is only 0.000000B.

Please check whether there is any other process using GPU 0.

If yes, please stop them, or start PaddlePaddle on another GPU.
If no, please decrease the batch size of your model.

(at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:87) (at ..\paddle\fluid\imperative\basic_engine.cc:586)

yeyupiaoling commented 1 year ago

使用更小的batch size

a00147600 commented 1 year ago

使用更小的batch size

大佬指的是 train.py的这个参数么 add_arg('batch_size', int, 64, '训练的批量大小')

yeyupiaoling commented 1 year ago

嗯嗯

yeyupiaoling / PPASR

训练deepspeech2_big出错gpu内存不足 #99