Closed clclclaiggg closed 12 months ago
重提试试
重提试试
还是不行,sft也会卡住
几卡,数据集多大,cache有正常生成吗?另外,贴一下脚本命令,tokenizer如果使用fast-tokenizer可能也会出现这个问题
用的测试的数据集,只有25k. 命令是
lr=2e-4 lora_rank=64 lora_alpha=128 lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj" modules_to_save="embed_tokens,lm_head" lora_dropout=0.05
pretrained_model=path/to/hf/llama-2/dir chinese_tokenizer_path=path/to/chinese-llama-2/tokenizer/dir dataset_dir=path/to/pt/data/dir data_cache=temp_data_cache_dir per_device_train_batch_size=1 gradient_accumulation_steps=8 block_size=256 output_dir=output_dir
deepspeed_config_file=ds_zero2_no_offload.json
torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \ --deepspeed ${deepspeed_config_file} \ --model_name_or_path /data/chenlong/LLaMA-Efficient-Tuning-main/models/7B-chat/ \ --tokenizer_name_or_path /data/chenlong/LLaMA-Efficient-Tuning-main/models/7B-chat/ \ --dataset_dir /data/chenlong/Chinese-LLaMA-Alpaca-2-main/data1/ \ --data_cache_dir /data/chenlong/Chinese-LLaMA-Alpaca-2-main/output/ \ --validation_split_percentage 0.001 \ --per_device_train_batch_size ${per_device_train_batch_size} \ --do_train \ --seed $RANDOM \ --fp16 \ --num_train_epochs 1 \ --lr_scheduler_type cosine \ --learning_rate ${lr} \ --warmup_ratio 0.05 \ --weight_decay 0.01 \ --logging_strategy steps \ --logging_steps 10 \ --save_strategy steps \ --save_total_limit 3 \ --save_steps 200 \ --gradient_accumulation_steps ${gradient_accumulation_steps} \ --preprocessing_num_workers 8 \ --block_size ${block_size} \ --output_dir /data/chenlong/Chinese-LLaMA-Alpaca-2-main/output1/ \ --overwrite_output_dir \ --ddp_timeout 30000 \ --logging_first_step True \ --lora_rank ${lora_rank} \ --lora_alpha ${lora_alpha} \ --trainable ${lora_trainable} \ --lora_dropout ${lora_dropout} \ --modules_to_save ${modules_to_save} \ --torch_dtype float16 \ --load_in_kbits 16 \ --gradient_checkpointing \ --ddp_find_unused_parameters False
data_cache_dir
几卡,数据集多大,cache有正常生成吗?另外,贴一下脚本命令,tokenizer如果使用fast-tokenizer可能也会出现这个问题
data_cache_dir生成的文件是空的
data_cache_dir
设置没用,cache会直接在数据所在文件夹下生成。我这边未复现您的问题,再调试一下吧
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.
data_cache_dir
设置没用,cache会直接在数据所在文件夹下生成。我这边未复现您的问题,再调试一下吧
您好,请问您解决这个问题了吗?我和您碰到了同样的问题
data_cache_dir
没用,缓存会直接数据在所在文件夹下现生成。我布拉格未复述您的问题,再调试一下吧您好,请问您解决了这个问题吗?我和您遇到了同样的问题
没有解决,换项目了
好的,感谢您的回复
Original Email
Sender:"long chen"< @.*** >;
Sent Time:2024/3/7 15:58
To:"ymcui/Chinese-LLaMA-Alpaca-2"< @.*** >;
Cc recipient:"dehaozhou"< @. >;"Comment"< @. >;
Subject:Re: [ymcui/Chinese-LLaMA-Alpaca-2] 加载数据集时卡住,是什么原因 (Issue #365)
data_cache_dir没用,缓存会直接数据在所在文件夹下现生成。我布拉格未复述您的问题,再调试一下吧
您好,请问您解决了这个问题吗?我和您遇到了同样的问题
没有解决,换项目了
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
提交前必须检查以下项目
问题类型
模型训练与精调
基础模型
Chinese-LLaMA-2 (7B/13B)
操作系统
Linux
详细描述问题
[INFO|tokenization_utils_base.py:1837] 2023-10-24 14:28:08,190 >> loading file tokenizer.model [INFO|tokenization_utils_base.py:1837] 2023-10-24 14:28:08,190 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:1837] 2023-10-24 14:28:08,190 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:1837] 2023-10-24 14:28:08,190 >> loading file tokenizer_config.json Using custom data configuration default-95ec87dea5b633cd 10/24/2023 14:28:09 - INFO - datasets.builder - Using custom data configuration default-95ec87dea5b633cd Loading Dataset Infos from /data/chenlong/enter/envs/chenllama/lib/python3.10/site-packages/datasets/packaged_modules/text 10/24/2023 14:28:09 - INFO - datasets.info - Loading Dataset Infos from /data/chenlong/enter/envs/chenllama/lib/python3.10/site-packages/datasets/packaged_modules/text 卡在这里不动了,请问是什么原因
依赖情况(代码类问题务必提供)
No response
运行日志或截图
No response