modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.91k stars 346 forks source link

Florence2 Inference #2227

Open paul-upfeat opened 1 week ago

paul-upfeat commented 1 week ago

I'm not able to run inference for my fine tuned model - tried running the command in the documentation.

I finetuned my model using:

swift sft \
    --model_type florence-2-large \
    --sft_type full \
    --output_dir /root/models \
    --num_train_epochs 3 \
    --use_flash_attn true \
    --dataset test \
    --dataset_test_ratio 0.1 \
    --custom_dataset_info dataset_info.json
ls ../../../models/florence-2-large/v7-20241011-025045/checkpoint-1251/
added_tokens.json           merges.txt                processing_florence2.py  scheduler.pt             tokenizer.json
config.json                 model.safetensors         processor_config.json    sft_args.json            trainer_state.json
configuration_florence2.py  optimizer.pt              rng_state_0.pth          special_tokens_map.json  training_args.bin
generation_config.json      preprocessor_config.json  rng_state_1.pth          tokenizer_config.json    vocab.json

This was using this image: modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.1.0-py310-torch2.3.0-tf2.16.1-1.18.0

The image was complaining about tf-keras so I installed that with pip install tf-keras. I used 2 3090 gpu

From the doc, I tried running inference with:

swift infer \
    --ckpt_dir /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251 \
    --stream false \
    --max_new_tokens 1024

This is the output:

swift infer     --ckpt_dir /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251     --stream false     --max_new_tokens 1024
2024-10-12 01:07:51.174459: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-12 01:07:51.188696: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-12 01:07:51.193105: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-12 01:07:51.204467: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-12 01:07:52.009240: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
run sh: `/usr/local/bin/python /usr/local/lib/python3.10/site-packages/swift/cli/infer.py --ckpt_dir /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251 --stream false --max_new_tokens 1024`
2024-10-12 01:07:56.737825: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-12 01:07:56.751559: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-12 01:07:56.755794: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-12 01:07:56.766359: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-10-12 01:07:57.563820: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO:swift] Successfully registered `/usr/local/lib/python3.10/site-packages/swift/llm/data/dataset_info.json`
[INFO:swift] Start time of running main: 2024-10-12 01:08:00.573659
[WARNING:swift] The checkpoint dir /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251 passed in is invalid, please make surethe dir contains a `configuration.json` file.
[INFO:swift] ckpt_dir: /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251
[INFO:swift] Successfully registered `/root/app/florence2/dataset_info.json`
[INFO:swift] Setting model_info['revision']: main
[INFO:swift] Setting self.eval_human: True
[INFO:swift] Setting overwrite_generation_config: True
[INFO:swift] args: InferArguments(model_type='florence-2-large', model_id_or_path='microsoft/Florence-2-large', model_revision='main', sft_type='full', template_type='florence', infer_backend='pt', ckpt_dir='/root/models/florence-2-large/v7-20241011-025045/checkpoint-1251', result_dir=None, load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='bf16', model_kwargs=None, dataset=[], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=-1, save_result=True, system=None, tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=1024, do_sample=None, temperature=None, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=False, merge_lora=False, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=True, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info='/root/app/florence2/dataset_info.json', device_map_config=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=None, disable_custom_all_reduce=True, enforce_eager=False, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, device_map_config_path=None)
[INFO:swift] Global seed set to 42
[INFO:swift] device_count: 2
[INFO:swift] Loading the model using model_dir: /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251
[INFO:swift] model_kwargs: {'low_cpu_mem_usage': True, 'device_map': 'auto'}
Could not locate the modeling_florence2.py inside /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/swift/cli/infer.py", line 5, in <module>
    infer_main()
  File "/usr/local/lib/python3.10/site-packages/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/swift/llm/infer.py", line 304, in llm_infer
    model, template = prepare_model_template(args, device_map=args.device_map_config)
  File "/usr/local/lib/python3.10/site-packages/swift/llm/infer.py", line 185, in prepare_model_template
    model, tokenizer = get_model_tokenizer(
  File "/usr/local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 6561, in get_model_tokenizer
    model, tokenizer = get_function(model_dir, torch_dtype, model_kwargs, load_model, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 2870, in get_model_tokenizer_florence
    model, tokenizer = get_model_tokenizer_from_repo(
  File "/usr/local/lib/python3.10/site-packages/swift/llm/utils/model.py", line 928, in get_model_tokenizer_from_repo
    model = automodel_class.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 65, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 547, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "/usr/local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 506, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
  File "/usr/local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 310, in get_cached_module_file
    resolved_module_file = cached_file(
  File "/usr/local/lib/python3.10/site-packages/transformers/utils/hub.py", line 374, in cached_file
    raise EnvironmentError(
OSError: /root/models/florence-2-large/v7-20241011-025045/checkpoint-1251 does not appear to have a file named modeling_florence2.py. Checkout 'https://huggingface.co//root/models/florence-2-large/v7-20241011-025045/checkpoint-1251/tree/None' for available files.

Seems to be complaining about missing files but I am not sure if I am missing a step here. Any help is appreciated, thank you!

Jintao-Huang commented 1 week ago

What version of Transformers?

paul-upfeat commented 1 week ago

I tried both 4.45.0.dev and 4.46.0.dev. Copying the modeling_florence2.py file directly from the model to the checkpoint folder seemed to solve it

Jintao-Huang commented 1 week ago

please use transformers==4.44.*

paul-upfeat commented 5 days ago

hi, ive tried installing 4.44.2 with the same error. copying the file directly into the checkpoint folder seems to fix it although im not sure what exactly is intended.

am i supposed to train with 4.44.2 for florence2?