Open MrGGLS opened 1 week ago
llama3-sft.yaml
### model model_name_or_path: models/llama-3-8b-Instruct ### method stage: sft do_train: true finetuning_type: full deepspeed: examples/deepspeed/ds_z3_config.json ### dataset dataset: sft_data_mixed_v1.0_sharegpt_dmg27l70q72_0.4 template: llama3 cutoff_len: 2048 overwrite_cache: true preprocessing_num_workers: 16 ### output output_dir: saves/llama3/sft_data_mixed_v1.0_sharegpt_dmg27l70q72_0.4_neatpacking_3epo_lr2e-5 logging_steps: 5 save_steps: 10086 plot_loss: true overwrite_output_dir: true ### train per_device_train_batch_size: 8 gradient_accumulation_steps: 4 learning_rate: 2.0e-5 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000 ### custom do_eval: false packing: true neat_packing: true flash_attn: fa2 save_strategy: "no" save_total_limit: 1 seed: 42 save_only_model: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: False ### eval # val_size: 0.1 # per_device_eval_batch_size: 1 # eval_strategy: no # eval_steps: 500
The training process is fine, but the saved tokenizer file is much larger than the one provided by the original llama-3
tokenzier.json (from llama3): 8.66 MB tokenzier.json (from trained model): 16.44 MB
When performing inference using vllm later on, an error is reported:
... Exception: data did not match any variant of untagged enum ModelWrapper at line 1250944 column 3
Need to manually modify the tokenizer configuration to the original one in order to perform normal inference.
save problem!
Same issue
I found that this issue was caused by the version of the transformers. I was using the latest version before, but now I’ve downgraded to 4.43.4, and the problem has been resolved
4.43.4
similar to https://github.com/huggingface/transformers/issues/33774
llama3-sft.yaml
The training process is fine, but the saved tokenizer file is much larger than the one provided by the original llama-3
When performing inference using vllm later on, an error is reported:
Need to manually modify the tokenizer configuration to the original one in order to perform normal inference.