texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

train llm retriever lora finetune error #119

Closed yxk9810 closed 2 months ago

yxk9810 commented 2 months ago

deepspeed=0.14.2 , transformers==4.37.0

deepspeed --include localhost:0,1 --master_port 60000 --module tevatron.retriever.driver.train \ --deepspeed deepspeed/ds_zero3_config.json \ --output_dir retriever-mistral \ --dataset_path /mnt/data/data/index/imp_data/train_data.jsonl \ --model_name_or_path /mnt/data/data//index/qwen-1.5 \ --lora \ --lora_target_modules q_proj \ --save_steps 50 \ --query_prefix "Query: " \ --passage_prefix "Passage: " \ --pooling eos \ --append_eos_token \ --normalize \ --report_to none \ --temperature 0.01 \ --per_device_train_batch_size 1 \ --gradient_checkpointing \ --train_group_size 1 \ --learning_rate 1e-4 \ --query_max_len 32 \ --passage_max_len 156 \ --num_train_epochs 1 \ --logging_steps 10 \ --overwrite_output_dir \ --gradient_accumulation_steps 4

AssertionError assert len(set(p.ds_id for p in self.params_in_ipg_bucket)) == len(self.params_in_ipg_bucket)

yxk9810 commented 2 months ago

change reduce bucket size ,the problem was solved