wangyuxinwhy / uniem

unified embedding model
Apache License 2.0
814 stars 61 forks source link

代码跑着跑着就挂了,CUDA out of memory #126

Open susht3 opened 4 months ago

susht3 commented 4 months ago

🐛 bug 说明

finetune中途突然OOM,是不是需要限制输入长度呢,请问代码内部会做截断么?目前输入长度没有做限制

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 31.74 GiB total capacity; 27.71 GiB already allocated; 91.12 MiB free; 31.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Python Version

None

wangyuxinwhy commented 4 months ago

内部是会做截断的,但 uniem 依然是动态申请内存的,所以如果数据中出现一个文本很长的样本,就有可能出现中途 OOM。可以考虑减少 batch_size,或者手动平衡文本长度。