josStorer / RWKV-Runner

A RWKV management and startup tool, full automation, only 8MB. And provides an interface compatible with the OpenAI API. RWKV is a large language model that is fully open source and available for commercial use.
https://www.rwkv.com
MIT License
5.19k stars 491 forks source link

请问lora微调时候出现train.py error是什么问题? #273

Open Sakuranoame opened 9 months ago

Sakuranoame commented 9 months ago

显示如下 --load_model models/RWKV-5-1B5-one-state-slim-novel-tuned.pth --data_file ./finetune/json2binidx_tool/data/training staff_text_document --ctx_len 1024 --epoch_steps 800 --epoch_count 20 --epoch_begin 0 --epoch_save 1 --micro_bsz 1 --accumulate_grad_batches 8 --pre_ffn 0 --head_qk 0 --lr_init 5e-5 --lr_final 100 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 --devices 1 --precision bf16 --grad_cp 1 --lora_r 16 --lora_alpha 16 --lora_dropout 0.01 apt cnMirror already set gcc installed pip installed ninja installed cuda 12 installed requirements satisfied loading models/RWKV-5-1B5-one-state-slim-novel-tuned.pth v5/train.py --vocab_size 65536 --n_layer 24 --n_embd 2048 INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpquqdkvoi INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpquqdkvoi/_remote_module_non_scriptable.py INFO:pytorch_lightning.utilities.rank_zero:########## work in progress ########## usage: train.py [-h] [--load_model LOAD_MODEL] [--wandb WANDB] [--proj_dir PROJ_DIR] [--random_seed RANDOM_SEED] [--data_file DATA_FILE] [--data_type DATA_TYPE] [--vocab_size VOCAB_SIZE] [--ctx_len CTX_LEN] [--epoch_steps EPOCH_STEPS] [--epoch_count EPOCH_COUNT] [--epoch_begin EPOCH_BEGIN] [--epoch_save EPOCH_SAVE] [--micro_bsz MICRO_BSZ] [--n_layer N_LAYER] [--n_embd N_EMBD] [--dim_att DIM_ATT] [--dim_ffn DIM_FFN] [--pre_ffn PRE_FFN] [--head_qk HEAD_QK] [--tiny_att_dim TINY_ATT_DIM] [--tiny_att_layer TINY_ATT_LAYER] [--lr_init LR_INIT] [--lr_final LR_FINAL] [--warmup_steps WARMUP_STEPS] [--beta1 BETA1] [--beta2 BETA2] [--adam_eps ADAM_EPS] [--grad_cp GRAD_CP] [--dropout DROPOUT] [--weight_decay WEIGHT_DECAY] [--weight_decay_final WEIGHT_DECAY_FINAL] [--my_pile_version MY_PILE_VERSION] [--my_pile_stage MY_PILE_STAGE] [--my_pile_shift MY_PILE_SHIFT] [--my_pile_edecay MY_PILE_EDECAY] [--layerwise_lr LAYERWISE_LR] [--ds_bucket_mb DS_BUCKET_MB] [--my_sample_len MY_SAMPLE_LEN] [--my_ffn_shift MY_FFN_SHIFT] [--my_att_shift MY_ATT_SHIFT] [--head_size_a HEAD_SIZE_A] [--head_size_divisor HEAD_SIZE_DIVISOR] [--my_pos_emb MY_POS_EMB] [--load_partial LOAD_PARTIAL] [--magic_prime MAGIC_PRIME] [--my_qa_mask MY_QA_MASK] [--my_random_steps MY_RANDOM_STEPS] [--my_testing MY_TESTING] [--my_exit MY_EXIT] [--my_exit_tokens MY_EXIT_TOKENS] [--emb] [--lora] [--lora_load LORA_LOAD] [--lora_r LORA_R] [--lora_alpha LORA_ALPHA] [--lora_dropout LORA_DROPOUT] [--lora_parts LORA_PARTS] [--logger [LOGGER]] [--enable_checkpointing [ENABLE_CHECKPOINTING]] [--default_root_dir DEFAULT_ROOT_DIR] [--gradient_clip_val GRADIENT_CLIP_VAL] [--gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM] [--num_nodes NUM_NODES] [--num_processes NUM_PROCESSES] [--devices DEVICES] [--gpus GPUS] [--auto_select_gpus [AUTO_SELECT_GPUS]] [--tpu_cores TPU_CORES] [--ipus IPUS] [--enable_progress_bar [ENABLE_PROGRESS_BAR]] [--overfit_batches OVERFIT_BATCHES] [--track_grad_norm TRACK_GRAD_NORM] [--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH] [--fast_dev_run [FAST_DEV_RUN]] [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES] [--max_epochs MAX_EPOCHS] [--min_epochs MIN_EPOCHS] [--max_steps MAX_STEPS] [--min_steps MIN_STEPS] [--max_time MAX_TIME] [--limit_train_batches LIMIT_TRAIN_BATCHES] [--limit_val_batches LIMIT_VAL_BATCHES] [--limit_test_batches LIMIT_TEST_BATCHES] [--limit_predict_batches LIMIT_PREDICT_BATCHES] [--val_check_interval VAL_CHECK_INTERVAL] [--log_every_n_steps LOG_EVERY_N_STEPS] [--accelerator ACCELERATOR] [--strategy STRATEGY] [--sync_batchnorm [SYNC_BATCHNORM]] [--precision PRECISION] [--enable_model_summary [ENABLE_MODEL_SUMMARY]] [--num_sanity_val_steps NUM_SANITY_VAL_STEPS] [--resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--profiler PROFILER] [--benchmark [BENCHMARK]] [--reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS] [--auto_lr_find [AUTO_LR_FIND]] [--replace_sampler_ddp [REPLACE_SAMPLER_DDP]] [--detect_anomaly [DETECT_ANOMALY]] [--auto_scale_batch_size [AUTO_SCALE_BATCH_SIZE]] [--plugins PLUGINS] [--amp_backend AMP_BACKEND] [--amp_level AMP_LEVEL] [--move_metrics_to_cpu [MOVE_METRICS_TO_CPU]] [--multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE] [--inference_mode [INFERENCE_MODE]] train.py: error: unrecognized arguments: staff_text_document

josStorer commented 9 months ago

你的训练数据路径是什么, 看起来可能是路径有空格

Sakuranoame commented 9 months ago

你的训练数据路径是什么, 看起来可能是路径有空格

确实是 不过出现新问题了: RuntimeError: Error building extension 'wkv5'

Sakuranoame commented 9 months ago

我找到一篇之前的issue 先试试能不能自己搞吧