Closed hyyuananran closed 2 hours ago
Remember that it's not just the model that takes up GPU memory - data passed into the model also eats up memory. You can set max_model_len
and/or max_num_seqs
to a smaller value to avoid OOM.
Thank you very much for your two clarifications, which have resolved my confusion.
Your current environment
The output of `python collect_env.py`
```text Your output of `python collect_env.py` here ```Model Input Dumps
No response
🐛 Describe the bug
orch.OutOfMemoryError: Error in model execution (input dumped to /tmp/err_executemodel.input_20241022-033658.pkl): CUDA out of memory. tried to allocate 1.93 G to o has a total capseity of 22.19 GiB of which 183.88 MiB 1s free. Process 3759893 has 22.00 G1B memory in use. of the allocated memory 19.99 Gi8 is allocated by pytar ad 4 6 1 x of 1yorch but unallocated. if reserved but unallocated memory is large try setting PYTORCH_CUDA ALLOC CONF=expandable segments:True to av o gtation.see documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda. html#environment-variables)
Before submitting a new issue...