Open chaofanl opened 10 months ago
hello, have you solved it? I also met this error and my torch version is 2.1.0 @chaofanl
After update accelerate version to 0.25.0, this issue disappeared, while will encounter another issue as belows :
how strange, my accelerate version is 0.25.0
sorry for that, due to enter the wrong docker, after double check the accelerate version is 0.22.0
It works, Thank you ver much!!!
Have you solved this issue? By changing accelerate version to 0.22.0?
yes @TobyGE , it works for me
Any update on this issue? Seeing this issue with Accelerate 0.27.2 and PyTorch version 2.1.0
Hi, my env as belows : docker image : docker run --gpus all -it --net=host --ipc=host --ulimit memlock=-1 -v /home/ubuntu/test:/home/finetune -v /ssd/gyou:/models --name=vicuna nvcr.io/nvidia/pytorch:23.07-py3
run command : root@g0300:/home/finetune/FastChat# cat ./scripts/train_vicuna_13b.sh torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train_mem.py \ --model_name_or_path /models/vicuna-13b \ --data_path data/dummy_conversation.json \ --bf16 True \ --output_dir output_vicuna_13b \ --num_train_epochs 3 \ --per_device_train_batch_size 4 \ --per_device_eval_batch_size 32 \ --gradient_accumulation_steps 4 \ --evaluation_strategy "steps" \ --eval_steps 1500 \ --save_strategy "steps" \ --save_steps 1500 \ --save_total_limit 8 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.04 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --fsdp "full_shard auto_wrap offload" \ --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \ --tf32 True \ --model_max_length 2048 \ --gradient_checkpointing True \ --lazy_preprocess True
root@g0300:/home/finetune/FastChat# pip list |grep torch pytorch-quantization 2.1.2 torch 2.1.0a0+b5021ba torch-tensorrt 1.5.0.dev0 torchdata 0.7.0a0 torchtext 0.16.0a0 torchvision 0.16.0a0
while I will encounter ValueError: FSDP requires PyTorch >= 2.1.0
anyone can help me have a check ? thanks very much