hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34k stars 4.18k forks source link

llava多模态模型使用dpo + lora微调报错 #5203

Closed asadfgglie closed 2 months ago

asadfgglie commented 2 months ago

Reminder

System Info

Reproduction

llamafactory-cli train \
    --stage dpo \
    --do_train True \
    --model_name_or_path llava-hf/llava-1.5-7b-hf \
    --preprocessing_num_workers 16 \
    --finetuning_type lora \
    --template llama2 \
    --flash_attn auto \
    --visual_inputs True \
    --dataset_dir data \
    --dataset dpo_en_demo \
    --cutoff_len 1024 \
    --learning_rate 5e-05 \
    --num_train_epochs 1.0 \
    --max_samples 100000 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --lr_scheduler_type cosine \
    --max_grad_norm 1.0 \
    --logging_steps 5 \
    --save_steps 100 \
    --warmup_steps 0 \
    --optim adamw_torch \
    --packing False \
    --report_to none \
    --output_dir saves/LLaVA1.5-7B-Chat/lora/testdpo \
    --bf16 True \
    --plot_loss True \
    --ddp_timeout 180000000 \
    --include_num_input_tokens_seen True \
    --quantization_bit 8 \
    --quantization_method bitsandbytes \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0 \
    --lora_target all \
    --pref_beta 0.1 \
    --pref_ftx 0 \
    --pref_loss sigmoid

Expected behavior

使用llava类型的多模态模型并且选择dpo + lora微调,数据集使用范例dpo数据集,在训练时会产生以下错误:

[2024-08-17 03:31:43,475] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
08/17/2024 03:31:45 - WARNING - llamafactory.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
08/17/2024 03:31:45 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.bfloat16
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:46,112 >> loading file tokenizer.model from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer.model
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:46,112 >> loading file tokenizer.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:46,112 >> loading file added_tokens.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/added_tokens.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:46,112 >> loading file special_tokens_map.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/special_tokens_map.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:46,112 >> loading file tokenizer_config.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-17 03:31:46,157 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|image_processing_base.py:375] 2024-08-17 03:31:46,775 >> loading configuration file preprocessor_config.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/preprocessor_config.json
[INFO|image_processing_base.py:375] 2024-08-17 03:31:46,975 >> loading configuration file preprocessor_config.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/preprocessor_config.json
[INFO|image_processing_base.py:429] 2024-08-17 03:31:46,976 >> Image processor CLIPImageProcessor {
  "crop_size": {
    "height": 336,
    "width": 336
  },
  "do_center_crop": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.48145466,
    0.4578275,
    0.40821073
  ],
  "image_processor_type": "CLIPImageProcessor",
  "image_std": [
    0.26862954,
    0.26130258,
    0.27577711
  ],
  "processor_class": "LlavaProcessor",
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "shortest_edge": 336
  }
}

[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:47,182 >> loading file tokenizer.model from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer.model
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:47,182 >> loading file tokenizer.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:47,182 >> loading file added_tokens.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/added_tokens.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:47,182 >> loading file special_tokens_map.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/special_tokens_map.json
[INFO|tokenization_utils_base.py:2289] 2024-08-17 03:31:47,182 >> loading file tokenizer_config.json from cache at /home/asadfgglie/.cache/huggingface/hub/models--llava-hf--llava-1.5-7b-hf/snapshots/fa3dd2809b8de6327002947c3382260de45015d4/tokenizer_config.json
[INFO|tokenization_utils_base.py:2533] 2024-08-17 03:31:47,214 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|processing_utils.py:722] 2024-08-17 03:31:47,661 >> Processor LlavaProcessor:
- image_processor: CLIPImageProcessor {
  "crop_size": {
    "height": 336,
    "width": 336
  },
  "do_center_crop": true,
  "do_convert_rgb": true,
  "do_normalize": true,
  "do_rescale": true,
  "do_resize": true,
  "image_mean": [
    0.48145466,
    0.4578275,
    0.40821073
  ],
  "image_processor_type": "CLIPImageProcessor",
  "image_std": [
    0.26862954,
    0.26130258,
    0.27577711
  ],
  "processor_class": "LlavaProcessor",
  "resample": 3,
  "rescale_factor": 0.00392156862745098,
  "size": {
    "shortest_edge": 336
  }
}

- tokenizer: LlamaTokenizerFast(name_or_path='llava-hf/llava-1.5-7b-hf', vocab_size=32000, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
        0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        32000: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        32001: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

{
  "chat_template": "{% for message in messages %}{% if message['role'] != 'system' %}{{ message['role'].upper() + ': '}}{% endif %}{# Render all images first #}{% for content in message['content'] | selectattr('type', 'equalto', 'image') %}{{ '<image>\n' }}{% endfor %}{# Render all text next #}{% if message['role'] != 'assistant' %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{{ content['text'] + ' '}}{% endfor %}{% else %}{% for content in message['content'] | selectattr('type', 'equalto', 'text') %}{% generation %}{{ content['text'] + ' '}}{% endgeneration %}{% endfor %}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ 'ASSISTANT:' }}{% endif %}",
  "processor_class": "LlavaProcessor"
}

08/17/2024 03:31:47 - INFO - llamafactory.data.loader - Loading dataset dpo_zh_demo.json...
Converting format of dataset (num_proc=16):   0%|                                                                                    | 0/300 [00:00<?, ? examples/s]
multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 678, in _write_generator_to_queue
    for i, result in enumerate(func(**kwargs)):
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3570, in _map_single
    writer.write_batch(batch)
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_writer.py", line 568, in write_batch
    arrays.append(pa.array(typed_sequence))
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/array.pxi", line 247, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 112, in pyarrow.lib._handle_arrow_array_protocol
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_writer.py", line 208, in __arrow_array__
    out = cast_array_to_feature(
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/table.py", line 1804, in wrapper
    return func(array, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/table.py", line 2018, in cast_array_to_feature
    casted_array_values = _c(array.values, feature[0])
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/table.py", line 1804, in wrapper
    return func(array, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/table.py", line 2115, in cast_array_to_feature
    raise TypeError(f"Couldn't cast array of type\n{array.type}\nto\n{feature}")
TypeError: Couldn't cast array of type
struct<content: string, role: string>
to
{'role': Value(dtype='string', id=None), 'content': Value(dtype='string', id=None), 'name': Value(dtype='string', id=None)}
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/asadfgglie/LLaMA-Factory/venv/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main
    run_exp()
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/train/tuner.py", line 56, in run_exp
    run_dpo(model_args, data_args, training_args, finetuning_args, callbacks)
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/train/dpo/workflow.py", line 44, in run_dpo
    dataset_module = get_dataset(model_args, data_args, training_args, stage="rm", **tokenizer_module)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/data/loader.py", line 233, in get_dataset
    dataset = _get_merged_dataset(data_args.dataset, model_args, data_args, training_args, stage)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/data/loader.py", line 153, in _get_merged_dataset
    datasets.append(_load_single_dataset(dataset_attr, model_args, data_args, training_args))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/data/loader.py", line 135, in _load_single_dataset
    return align_dataset(dataset, dataset_attr, data_args, training_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/src/llamafactory/data/aligner.py", line 244, in align_dataset
    return dataset.map(
           ^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 602, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 567, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 3248, in map
    for rank, done, content in iflatmap_unordered(
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 718, in iflatmap_unordered
    [async_result.get(timeout=0.05) for async_result in async_results]
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/datasets/utils/py_utils.py", line 718, in <listcomp>
    [async_result.get(timeout=0.05) for async_result in async_results]
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/asadfgglie/LLaMA-Factory/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get
    raise self._value
TypeError: Couldn't cast array of type
struct<content: string, role: string>
to
{'role': Value(dtype='string', id=None), 'content': Value(dtype='string', id=None), 'name': Value(dtype='string', id=None)}

Others

No response

hiyouga commented 2 months ago

已经定位问题,是 llava 的问题,可以先试一下 qwen2vl

asadfgglie commented 2 months ago

了解!

hiyouga commented 2 months ago

fixed: https://github.com/hiyouga/LLaMA-Factory/blob/main/examples/train_lora/qwen2vl_lora_dpo.yaml