Closed Meatfucker closed 9 months ago
I'm having the same issue while doing inference with the same weights (1.5-13b), also on Linux and GPU. Interestingly, I never have the issue when doing inference with the smaller (7b) model.
This should have been fixed on main by #28032
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers
version: 4.36.1Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When running inference on LLaVa, it sometimes will crash seemingly randomly with the following error.
Error device-side assert triggered at line 738 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu /opt/conda/conda-bld/pytorch_1699449183005/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [0,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"failed.
Here is roughly how Im loading the model and doing inference.
Expected behavior
I would expect it to infer and return a response as it normally does. Interestingly, this same code never crashes when ran on windows, only on linux.