jiahe7ay / MINI_LLM

This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.
327 stars 52 forks source link

请教在3090 * 8 的机器下,报错cuda out of memory怎么处理 #4

Closed weiiWill closed 6 months ago

weiiWill commented 6 months ago

File "/home/xiezizhe/anaconda3090/envs/willm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl shift_logits = lm_logits[..., :-1, :].contiguous() torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.22 GiB. GPU 0 has a total capacity of 23.69 GiB of which 1.98 GiB is free. Process 38499 has 21.70 GiB memory in use. Of the allocated memory 20.49 GiB is allocated by PyTorch, and 818.86 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

weiiWill commented 6 months ago

调整小了batch size后解决了 😊