Closed philschmid closed 11 months ago
i tried on another machine and it works fine there, so something else is going on. i will continue investigation in https://github.com/Dao-AILab/flash-attention/issues/311
@tmm1 what machine have you used where it is not working?
it is a local machine with 3090. I got it working by starting with a fresh CUDA 11.8 conda environment. the flash-attn tests were failing in the broken env.
hi, thanks for documenting this.
i'm curious if you were able to train successfully with peft + flash attention?
i keep seeing loss spikes after a few iterations