Open ibicdev opened 10 months ago
Seems to be an hardware and environment issue unrelated to the code. I used cuda 11.8
I am also using cuda 11.8, and pytorch 2.01 for cuda 11.8. Also tried pytorch nightly and got the same error. --use_flash_attn False
didn't make a difference either. The error is RuntimeError: CUDA error: device-side assert triggered
, followed by about a hundred lines of
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [313,0,0], thread: [64,0,0] Assertion srcIndex < srcSelectDimSize failed.
This error looked similar to https://github.com/lm-sys/FastChat/issues/199; tried their suggests and none worked. One explanation on that thread is the vocab causing embedding lookup out-of-bounds issue though the vocab seems already fixed in llama-2.
Does the example without code changes work?
Yes, it worked well without any code change.
What change did you make?
The only change I made is --model_id
, from tiiuae/falcon-180B
to meta-llama/Llama-2-70b-hf
. The full command is
torchrun --nproc_per_node 8 run_ds_lora.py \
--model_id meta-llama/Llama-2-70b-hf \
--dataset_path dolly-processed \
--output_dir falcon-180b-lora-fa \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--learning_rate 4e-3 \
--gradient_checkpointing True \
--gradient_accumulation_steps 8 \
--bf16 True \
--tf32 True \
--use_flash_attn True \
--lr_scheduler_type "constant_with_warmup" \
--logging_steps 25 \
--save_steps 100 \
--save_total_limit 3 \
--deepspeed configs/ds_falcon_180b_z3.json
did you make changes to the flash attention patch? The example only works with falcon since it has a custom patch to use flash attention.
Ahh, I didn't. I saw your code https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/utils/peft_utils.py#L38-L41, and thought it's already taken care of.
Also, even when I used --use_flash_attn False
I still got the same error.
Excited to see flash-attn 2 natively supported in transformers! Would you plan to update this post to work with this new feature?
Yes! 👍🏻 Plan to update all my posts and remove that patches once there is an official release.
Great! Looking forward to the updates.
Thanks Phil for the great post "Fine-tune Falcon 180B with DeepSpeed ZeRO, LoRA & Flash Attention". When I tried to change falon to llama2 (tried all 3 mode sizes), I always got "CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)". Should there be more changes than just model name to make it work? Or will you have a follow up post about fine tuning Llama2 with DeepSpeed + LoRA?