Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Thanks so much for your work on this project, I'm trying to finetune the mini-internvl-v1.5 and it looks like its unable to find the model. Let me know what I should change to address this, thanks! Do you also happen to know how much GPU memory is required to train this model?
CUDA_VISIBLE_DEVICES=0,1 swift sft --model_type mini-internvl-chat-4b-v1_5 --dataset train_test.jsonl
[INFO:swift] args: SftArguments(model_type='mini-internvl-chat-4b-v1_5', model_id_or_path='OpenGVLab/Mini-InternVL-Chat-4B-V1-5', model_revision='master', sft_type='lora', freeze_parameters=0.0, additional_trainable_parameters=[], tuner_backend='peft', template_type='internvl-phi3', output_dir='/home/ubuntu/swift/output/mini-internvl-chat-4b-v1_5/v1-20240701-154202', add_output_dir_suffix=True, ddp_backend=None, ddp_find_unused_parameters=None, ddp_broadcast_buffers=None, seed=42, resume_from_checkpoint=None, resume_only_model=False, ignore_data_skip=False, dtype='bf16', packing=False, dataset=['final_datasets/merged_reddit_sephora/output_data/image_only_merged_default_format.json'], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, use_loss_scale=False, loss_scale_config_path='/home/ubuntu/swift/swift/llm/agent/default_loss_scale_config.json', system=None, tools_prompt='react_en', max_length=4096, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, lora_target_modules=['qkv_proj'], lora_rank=8, lora_alpha=32, lora_dropout_p=0.05, lora_bias_trainable='none', lora_modules_to_save=[], lora_dtype='AUTO', lora_lr_ratio=None, use_rslora=False, use_dora=False, init_lora_weights='true', rope_scaling=None, boft_block_size=4, boft_block_num=0, boft_n_butterfly_factor=1, boft_target_modules=['DEFAULT'], boft_dropout=0.0, boft_modules_to_save=[], vera_rank=256, vera_target_modules=['DEFAULT'], vera_projection_prng_key=0, vera_dropout=0.0, vera_d_initial=0.1, vera_modules_to_save=[], adapter_act='gelu', adapter_length=128, use_galore=False, galore_rank=128, galore_target_modules=None, galore_update_proj_gap=50, galore_scale=1.0, galore_proj_type='std', galore_optim_per_parameter=False, galore_with_embedding=False, adalora_target_r=8, adalora_init_r=12, adalora_tinit=0, adalora_tfinal=0, adalora_deltaT=1, adalora_beta1=0.85, adalora_beta2=0.85, adalora_orth_reg_weight=0.5, ia3_target_modules=['DEFAULT'], ia3_feedforward_modules=[], ia3_modules_to_save=[], llamapro_num_new_blocks=4, llamapro_num_groups=None, neftune_noise_alpha=None, neftune_backend='transformers', lisa_activated_layers=0, lisa_step_interval=20, gradient_checkpointing=True, deepspeed=None, batch_size=1, eval_batch_size=1, num_train_epochs=1, max_steps=-1, optim='adamw_torch', adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, learning_rate=0.0001, weight_decay=0.1, gradient_accumulation_steps=16, max_grad_norm=0.5, predict_with_generate=False, lr_scheduler_type='linear', warmup_ratio=0.05, eval_steps=50, save_steps=50, save_only_model=False, save_total_limit=2, logging_steps=5, acc_steps=1, dataloader_num_workers=0, dataloader_pin_memory=False, dataloader_drop_last=False, push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, push_hub_strategy='push_best', test_oom_error=False, disable_tqdm=False, lazy_tokenize=True, preprocess_num_proc=1, use_flash_attn=None, ignore_args_error=False, check_model_is_latest=True, logging_dir='/home/ubuntu/swift/output/mini-internvl-chat-4b-v1_5/v1-20240701-154202/runs', report_to=['tensorboard'], acc_strategy='token', save_on_each_node=True, evaluation_strategy='steps', save_strategy='steps', save_safetensors=True, gpu_memory_fraction=None, include_num_input_tokens_seen=False, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config_path=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, fsdp='', fsdp_config=None, sequence_parallel_size=1, model_layer_cls_name=None, metric_warmup_step=0, fsdp_num=1, per_device_train_batch_size=None, per_device_eval_batch_size=None, eval_strategy=None, self_cognition_sample=0, train_dataset_mix_ratio=0.0, train_dataset_mix_ds=['ms-bench'], train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, only_save_model=None, neftune_alpha=None, deepspeed_config_path=None, model_cache_dir=None, custom_train_dataset_path=[], custom_val_dataset_path=[])
[INFO:swift] Global seed set to 42
device_count: 2
rank: -1, local_rank: -1, world_size: 1, local_world_size: 1
[INFO:swift] Downloading the model from ModelScope Hub, model_id: OpenGVLab/Mini-InternVL-Chat-4B-V1-5
[ERROR:modelscope] The request model: OpenGVLab/Mini-InternVL-Chat-4B-V1-5 does not exist!
Traceback (most recent call last):
File "/home/ubuntu/swift/swift/cli/sft.py", line 5, in <module>
sft_main()
File "/home/ubuntu/swift/swift/utils/run_utils.py", line 27, in x_main
result = llm_x(args, **kwargs)
File "/home/ubuntu/swift/swift/llm/sft.py", line 107, in llm_sft
model, tokenizer = get_model_tokenizer(
File "/home/ubuntu/swift/swift/llm/utils/model.py", line 5189, in get_model_tokenizer
model_dir = safe_snapshot_download(
File "/home/ubuntu/swift/swift/llm/utils/model.py", line 5154, in safe_snapshot_download
model_dir = snapshot_download(model_id_or_path, revision, ignore_file_pattern=ignore_file_pattern)
File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/snapshot_download.py", line 94, in snapshot_download
revision_detail = _api.get_valid_revision_detail(
File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/api.py", line 499, in get_valid_revision_detail
all_branches_detail, all_tags_detail = self.get_model_branches_and_tags_details(
File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/api.py", line 579, in get_model_branches_and_tags_details
handle_http_response(r, logger, cookies, model_id)
File "/opt/conda/envs/swift-env/lib/python3.10/site-packages/modelscope/hub/errors.py", line 117, in handle_http_response
raise HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: The request model: OpenGVLab/Mini-InternVL-Chat-4B-V1-5 does not exist!
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context
Add any other context about the problem here(在这里补充其他信息)
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Thanks so much for your work on this project, I'm trying to finetune the mini-internvl-v1.5 and it looks like its unable to find the model. Let me know what I should change to address this, thanks! Do you also happen to know how much GPU memory is required to train this model?
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
Additional context Add any other context about the problem here(在这里补充其他信息)