Closed lonngxiang closed 2 months ago
flash-attn 2.6.3
I tried flash-attention 2 to train, got similary error. so i didn't mention this repo support flash-attention2. If you find how to support it, PR is welcome!
I debugged what is wrong when enable flash_attention_2 in finetune.py.
Conclusion: fixed, see my latest commit https://github.com/zhangfaen/finetune-Qwen2-VL/commit/ff383f7b03b3beb5e8c5ca83c1ddc004ca7a2eec
How:
Solution:
I closed this issue. if you still have problem, feel free to re-open it.
收到,可以训练了,就是显存占用还是不低;好项目,用pytorch原生训练
@lonngxiang 麻烦问下,用的什么显卡?我用的4090,24G显存,迭代一轮就报显存不足~你数据量多大?可否交流一下?
重新安装新transformers后,可以训练,也是4090,就是加载很慢
重新安装新transformers后,可以训练,也是4090,就是加载很慢 我按照上面指导,重新按照requirements.txt安装的库,之前确实也报跟你之前一样的问题,重新安装了,不报找不到索引的问题,但是提示显存不足,程序退出,训练数据都是按照demo来的,并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2' 请教一下,你执行的时候调整了哪些内容?基础模型是Qwen2-VL-2B-Instruct吗?
是,就把batch调成1了,
你好 ,请问 你是怎么解决的
重新安装新transformers后,可以训练,也是4090,就是加载很慢 我按照上面指导,重新按照requirements.txt安装的库,之前确实也报跟你之前一样的问题,重新安装了,不报找不到索引的问题,但是提示显存不足,程序退出,训练数据都是按照demo来的,并且在加载模型的时候添加参数torch_dtype=torch.bfloat16, attn_implementation='flash_attention_2' 请教一下,你执行的时候调整了哪些内容?基础模型是Qwen2-VL-2B-Instruct吗?
你好,请问,你跑起来了吗
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [64,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [298,0,0], thread: [65,0,0] Assertion-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.