After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported.

poloclub / tsr-convstem

High-Performance Transformers for Table Structure Recognition Need Early Convolutions

https://arxiv.org/abs/2311.05565

MIT License

39 stars 2 forks source link

After the first Epoch ends, an error "torch.cuda.OutOfMemoryError: CUDA out of memory" is reported. #4

Closed Kinsue closed 6 months ago

Kinsue commented 6 months ago

In my training configuration, training can be done on a single 4090 by reducing the batch size. However, after the first Epoch ends, an "torch.cuda.OutOfMemoryError: CUDA out of memory" error occurs. May I ask on what device is your team training? Should I continue to reduce the batch size?

ShengYun-Peng commented 6 months ago

Hi @Kinsue, perhaps reducing the batch size would help mitigate the issue. For this paper, the model was trained on A100 80G. I recommend trying out our latest work, namely UniTable, at https://github.com/poloclub/unitable. We have provided a tiny portion (20 samples) of PubTabNet for some toy pretraining and finetuning. Meanwhile, you can also control the max_seq_len and img_size to lower the GPU memory usage.

Kinsue commented 6 months ago

Thank you very much for your reply. I am happy to try out your latest work and appreciate your contributions.