modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.01k stars 355 forks source link

feat: Add aya models #2335

Closed Aunali321 closed 2 days ago

Aunali321 commented 3 days ago

PR type

PR information

Add new Aya models by Cohere for AI.

Experiment results

The models do not exist yet on ModelScope. Maybe they will be added soon. This is my first PR and I don't understand chinese so sorry if i make a mistake.

Aunali321 commented 3 days ago

Here's a small training run using aya-expanse-8b. Config:

USE_HF=1 \
HF_HUB_ENABLE_HF_TRANSFER=1 \
swift rlhf \
    --rlhf_type kto \
    --model_type aya-expanse-8b \
    --beta 0.1 \
    --desirable_weight 1.0 \
    --undesirable_weight 1.0 \
    --model_revision master \
    --sft_type lora \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset Cossale/informal-to-professional-kto \
    --train_dataset_sample -1 \
    --num_train_epochs 1 \
    --max_length 8192 \
    --check_dataset_strategy warning \
    --lora_rank 32 \
    --lora_alpha 64 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 2e-4 \
    --use_dora True \
    --neftune_noise_alpha 5 \
    --gradient_accumulation_steps 4 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --use_flash_attn true

Merging LoRA:

USE_HF=1 \
swift export \
    --model_type aya-expanse-8b \
    --ckpt_dir '/root/llm-finetuning-setup/swift/output/aya-expanse-8b/v2-20241024-170858/checkpoint-35' \
    --merge_lora true 

Training:

image

Result:

image

Jintao-Huang commented 3 days ago

thanks ❤️

Jintao-Huang commented 3 days ago

https://github.com/modelscope/ms-swift/blob/main/CONTRIBUTING.md#code-standards-and-development-approach

Please check the lint. 😊

Aunali321 commented 2 days ago

Fixed the lint, please re-run. thanks.

Jintao-Huang commented 2 days ago

modelscope model: