modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.64k stars 312 forks source link

Inference and fine-tuning support for GOT-OCR2. #2122

Open Jintao-Huang opened 4 days ago

Jintao-Huang commented 4 days ago

Inference:

CUDA_VISIBLE_DEVICES=0 swift infer --model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0
<<< <image>OCR: 
Input an image path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr.png
简介 SWIFT支持250+LLM和35+MLLM(多模态大模型)的训练、推理、 评测和部署。开发者可以直接将我们的框架应用到自己的Research和 生产环境中,实现模型训练评测到应用的完整链路。我们除支持了 PEFT提供的轻量训练方案外,也提供了一个完整的Adapters库以支持 最新的训练技术,如NEFTune、LoRA+、LLaMA-PRO等,这个适配器 库可以脱离训练脚本直接使用在自己的自定流程中。 为方便不熟悉深度学习的用户使用,我们提供了一个Gradio的web-ui用 于控制训练和推理,并提供了配套的深度学习课程和最佳实践供新手入 门。 此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的 全参数训练和LoRA训练。 SWIFT具有丰富的文档体系,如有使用问题请请查看这里 可以在Huggingfacespace和ModelScope创空间中体验SWIFTweb ui功能了。
--------------------------------------------------
<<< clear
<<< <image>OCR: 
Input an image path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/ocr_en.png
Introduction
SWIFT supports training, inference, evaluation and deployment of 250+ LLMs 
and 35+ MLLMs (multimodal large models). Developers can directly apply our 
framework to their own research and production environments to realize the 
complete workflow from model training and evaluation to application. In addition 
to supporting the lightweight training solutions provided by PEFT, we also 
provide a complete Adapters library to support the latest training techniques 
such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used 
directly in your own custom workflow without our training scripts.
To facilitate use by users unfamiliar with deep learning, we provide a Gradio 
web-ui for controlling training and inference, as well as accompanying deep 
learning courses and best practices for beginners.
Additionally, we are expanding capabilities for other modalities. Currently, we 
support full-parameter training and LoRA training for AnimateDiff.
SWIFT has rich documentations for users, please check here.
SWIFT web-ui is available both on Huggingface space and ModelScope studio, 
please feel free to try!

fine-tuning:

# fine-tuning LLM & projector, freeze vision encoder
CUDA_VISIBLE_DEVICES=0 swift sft \
    --model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0 \
  --sft_type lora \
  --dataset latex-ocr-print#5000

# DDP & ZeRO2
NPROC_PER_NODE=4 \
CUDA_VISIBLE_DEVICES=0,1,2,3 swift sft \
    --model_type got-ocr2 --model_id_or_path stepfun-ai/GOT-OCR2_0 \
  --sft_type lora \
  --dataset latex-ocr-print#5000 \
  --deepspeed default-zero2

train_loss (36)

inference after fine-tuning

CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/got-ocr2/vx-xxx/checkpoint-xxx \
    --load_dataset_config true \
截屏2024-09-25 16 21 18
cgq0816 commented 2 days ago

你好,请问一下支持 swift vllm 部署吗?类似下面的指令 CUDA_VISIBLE_DEVICES=0 swift deploy --model_type llava1_6-vicuna-13b-instruct --infer_backend vllm

tbwang-clound commented 2 days ago

请问何时能支持vllm推理?