modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4k stars 354 forks source link

Error finetuning - OpenGVLab/Mini-InternVL-Chat-4B-V1-5 does not exist #1265

Closed babla9 closed 3 months ago

babla9 commented 3 months ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)

Thanks so much for your work on this project, I'm trying to finetune the mini-internvl-v1.5 and it looks like its unable to find the model. Let me know what I should change to address this, thanks! Do you also happen to know how much GPU memory is required to train this model?

CUDA_VISIBLE_DEVICES=0,1 swift sft --model_type mini-internvl-chat-4b-v1_5 --dataset train_test.jsonl
[INFO:swift] args: SftArguments(model_type='mini-internvl-chat-4b-v1_5', model_id_or_path='OpenGVLab/Mini-InternVL-Chat-4B-V1-5', model_revision='master', sft_type='lora', freeze_parameters=0.0, additional_trainable_parameters=[], tuner_backend='peft', template_type='internvl-phi3', output_dir='/home/ubuntu/swift/output/mini-internvl-chat-4b-v1_5/v1-20240701-154202', add_output_dir_suffix=True, ddp_backend=None, ddp_find_unused_parameters=None, ddp_broadcast_buffers=None, seed=42, resume_from_checkpoint=None, resume_only_model=False, ignore_data_skip=False, dtype='bf16', packing=False, dataset=['final_datasets/merged_reddit_sephora/output_data/image_only_merged_default_format.json'], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, use_loss_scale=False, loss_scale_config_path='/home/ubuntu/swift/swift/llm/agent/default_loss_scale_config.json', system=None, tools_prompt='react_en', max_length=4096, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, lora_target_modules=['qkv_proj'], lora_rank=8, lora_alpha=32, lora_dropout_p=0.05, lora_bias_trainable='none', lora_modules_to_save=[], lora_dtype='AUTO', lora_lr_ratio=None, use_rslora=False, use_dora=False, init_lora_weights='true', rope_scaling=None, boft_block_size=4, boft_block_num=0, boft_n_butterfly_factor=1, boft_target_modules=['DEFAULT'], boft_dropout=0.0, boft_modules_to_save=[], vera_rank=256, vera_target_modules=['DEFAULT'], vera_projection_prng_key=0, vera_dropout=0.0, vera_d_initial=0.1, vera_modules_to_save=[], adapter_act='gelu', adapter_length=128, use_galore=False, galore_rank=128, galore_target_modules=None, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, adalora_target_r=8, adalora_init_r=12, adalora_tinit=0, adalora_tfinal=0, adalora_deltaT=1, adalora_beta1=0.85, adalora_beta2=0.85, adalora_orth_reg_weight=0.5, ia3_target_modules=['DEFAULT'], ia3_feedforward_modules=[], ia3_modules_to_save=[], llamapro_num_new_blocks=4, llamapro_num_groups=None, neftune_noise_alpha=None, neftune_backend='transformers', lisa_activated_layers=0, lisa_step_interval=20, gradient_checkpointing=True, deepspeed=None, batch_size=1, eval_batch_size=1, num_train_epochs=1, max_steps=-1, optim='adamw_torch', adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, learning_rate=0.0001, weight_decay=0.1, gradient_accumulation_steps=16, max_grad_norm=0.5, predict_with_generate=False, lr_scheduler_type='linear', warmup_ratio=0.05, eval_steps=50, save_steps=50, save_only_model=False, save_total_limit=2, logging_steps=5, acc_steps=1, dataloader_num_workers=0, dataloader_pin_memory=False, dataloader_drop_last=False, push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, push_hub_strategy='push_best', test_oom_error=False, disable_tqdm=False, lazy_tokenize=True, preprocess_num_proc=1, use_flash_attn=None, ignore_args_error=False, check_model_is_latest=True, logging_dir='/home/ubuntu/swift/output/mini-internvl-chat-4b-v1_5/v1-20240701-154202/runs', report_to=['tensorboard'], acc_strategy='token', save_on_each_node=True, evaluation_strategy='steps', save_strategy='steps', save_safetensors=True, gpu_memory_fraction=None, include_num_input_tokens_seen=False, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config_path=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, fsdp='', fsdp_config=None, sequence_parallel_size=1, model_layer_cls_name=None, metric_warmup_step=0, fsdp_num=1, per_device_train_batch_size=None, per_device_eval_batch_size=None, eval_strategy=None, self_cognition_sample=0, train_dataset_mix_ratio=0.0, train_dataset_mix_ds=['ms-bench'], train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, only_save_model=None, neftune_alpha=None, deepspeed_config_path=None, model_cache_dir=None, custom_train_dataset_path=[], custom_val_dataset_path=[])
[INFO:swift] Global seed set to 42
device_count: 2
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenGVLab/Mini-InternVL-Chat-4B-V1-5
[ERROR:modelscope] The request model: OpenGVLab/Mini-InternVL-Chat-4B-V1-5 does not exist!
Traceback (most recent call last):
  File "/home/ubuntu/swift/swift/cli/sft.py", line 5, in <module>
    sft_main()
  File "/home/ubuntu/swift/swift/utils/run_utils.py", line 27, in x_main
    result = llm_x(args, **kwargs)
  File "/home/ubuntu/swift/swift/llm/sft.py", line 107, in llm_sft
    model, tokenizer = get_model_tokenizer(
  File "/home/ubuntu/swift/swift/llm/utils/model.py", line 5189, in get_model_tokenizer
    model_dir = safe_snapshot_download(
  File "/home/ubuntu/swift/swift/llm/utils/model.py", line 5154, in safe_snapshot_download
    model_dir = snapshot_download(model_id_or_path, revision, ignore_file_pattern=ignore_file_pattern)
  File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/snapshot_download.py", line 94, in snapshot_download
    revision_detail = _api.get_valid_revision_detail(
  File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/api.py", line 499, in get_valid_revision_detail
    all_branches_detail, all_tags_detail = self.get_model_branches_and_tags_details(
  File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/api.py", line 579, in get_model_branches_and_tags_details
    handle_http_response(r, logger, cookies, model_id)
  File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/errors.py", line 117, in handle_http_response
    raise HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: The request model: OpenGVLab/Mini-InternVL-Chat-4B-V1-5 does not exist!

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)

Additional context Add any other context about the problem here(在这里补充其他信息)

Jintao-Huang commented 3 months ago

USE_HF=1 CUDA_VISIBLE_DEVICES=0,1 swift sft --model_type mini-internvl-chat-4b-v1_5 --dataset train_test.jsonl

LiJunY commented 3 months ago

指定 --model_id_or_path 补充模型路径[mini4b]

babla9 commented 3 months ago

thanks to you both!