huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.12k stars 27.04k forks source link

TypeError: Object of type BitsAndBytesConfig is not JSON serializable when using gradient_checkpointing=True in TrainingArguments #26905

Closed andreducfer closed 10 months ago

andreducfer commented 1 year ago

System Info

I am running via script inside a Docker running in a Linux environment.

Who can help?

@younesbelkada this issue is similar but not equal to #24137.

Information

Tasks

Reproduction

Bellow is the script used to finetune a LLama2-7b-chat:


import os
from datasets import load_dataset, concatenate_datasets
from transformers import TrainingArguments, AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import LoraConfig
import torch
from environment import TOKEN_HF, TOKEN_WANDB
from trl import SFTTrainer
import wandb

MODEL = "/scratch/LLM/LLAMA2/Llama-2-7b-chat-hf"
DATASET_NAME = "andreducfer/thesession-abc-prompts"
OUTPUT_DIR = "/scratch/andre/llama2_finetuned/"
REFINED_MODEL = "llama2-7b-finetuned-with-thesession-abc-prompts"
SEED = 5

os.environ["WANDB_SILENT"] = "true"
os.environ["WANDB_API_KEY"] = TOKEN_WANDB
wandb.init(project="llama2-music")

def download_dataset(dataset_name):
    dataset_train = load_dataset(dataset_name, split="train", token=TOKEN_HF)
    dataset_validation = load_dataset(dataset_name, split="validation", token=TOKEN_HF)
    dataset_test = load_dataset(dataset_name, split="test", token=TOKEN_HF)

    dataset = concatenate_datasets([dataset_train, dataset_validation, dataset_test])

    return dataset

def create_model_tokenizer():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_quant_type="nf4"
    )

    model = AutoModelForCausalLM.from_pretrained(
        MODEL,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto",
        cache_dir=MODEL,
        token=TOKEN_HF
    )

    tokenizer = AutoTokenizer.from_pretrained(
        MODEL,
        cache_dir=MODEL,
        token=TOKEN_HF,
        device_map="auto",
        quantization_config=bnb_config
    )

    tokenizer.pad_token = tokenizer.eos_token
    tokenizer.padding_side = "right"

    model.config.use_cache = False
    model.config.pretraining_tp = 1

    model_config_json = model.config.to_json_string()
    print(model_config_json)

    return model, tokenizer

def create_lora_configuration():
    peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.1,
        r=64,
        bias="none",
        task_type="CAUSAL_LM",
    )

    return peft_config

def create_training_configuration():
    training_args = TrainingArguments(
        output_dir=OUTPUT_DIR,
        num_train_epochs=3,
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        gradient_checkpointing=True,
        optim="paged_adamw_32bit",
        save_steps=25,
        logging_steps=25,
        learning_rate=2e-4,
        weight_decay=0.001,
        fp16=False,
        bf16=True,
        max_grad_norm=0.3,
        max_steps=-1,
        warmup_ratio=0.03,
        group_by_length=True,
        lr_scheduler_type="constant"
    )

    return training_args

def create_trainer(model, training_args, small_train_dataset, lora_configuration, tokenizer):
    trainer = SFTTrainer(
        model=model,
        train_dataset=small_train_dataset,
        peft_config=lora_configuration,
        dataset_text_field="text",
        tokenizer=tokenizer,
        args=training_args,
        max_seq_length=4096
    )

    return trainer

if __name__ == '__main__':
    configured_model, configured_tokenizer = create_model_tokenizer()

    peft_configuration = create_lora_configuration()
    training_configuration = create_training_configuration()

    loaded_train_dataset = download_dataset(DATASET_NAME)

    configured_trainer = create_trainer(model=configured_model, training_args=training_configuration,
                                        small_train_dataset=loaded_train_dataset, lora_configuration=peft_configuration,
                                        tokenizer=configured_tokenizer)

    configured_trainer.train()

    configured_trainer.model.save_pretrained(REFINED_MODEL)

Bellow you can see 4 lines of the dataset that I am using:

{"instruction": "Write a melody in ABC Notation in a specific style.", "input": "Folk", "output": "Here it is the melody in ABC Notation in Folk style:\nX:0\nT: Cluck Old Hen\nM: 4/4\nL: 1/8\nK: Ador\n|: e2ae g2ag|e2ae d2dd| e2ae g2ag| edc2 A2AA :| |: A2A2 [c4e4] |A2A2 [G4B4]| A2AA cAcd| edc2 A2AA :|", "prompt": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\n", "text": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\nHere it is the melody in ABC Notation in Folk style:\nX:0\nT: Cluck Old Hen\nM: 4/4\nL: 1/8\nK: Ador\n|: e2ae g2ag|e2ae d2dd| e2ae g2ag| edc2 A2AA :| |: A2A2 [c4e4] |A2A2 [G4B4]| A2AA cAcd| edc2 A2AA :|"}
{"instruction": "Write a melody in ABC Notation in a specific style.", "input": "Folk", "output": "Here it is the melody in ABC Notation in Folk style:\nX:0\nT: Flop-Eared Mule, The\nM: 4/4\nL: 1/8\nK: Dmaj\n|: e2 | \"D\" f2 ff d2 dd | \"D\" A2 AA F2 FF | \"A\"  E2 EF GF E2 | \"D\" DEFG A2 A2 |\n\"D\" f2 ff d2 dd | \"D\" A2 AA F2 FF | \"A\" E2 EF GF E2 | \"D\" D2 F2 D2 :|\nK:A\n|: cd | e2 ec e2 ec | efed c2 cc | B2 Bc dc B2 | ABcd e2 cd | e2 ec e2 ec | efed c2 c2 | B2 Bc d2 cB | A2 c2 A2 :|", "prompt": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\n", "text": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\nHere it is the melody in ABC Notation in Folk style:\nX:0\nT: Flop-Eared Mule, The\nM: 4/4\nL: 1/8\nK: Dmaj\n|: e2 | \"D\" f2 ff d2 dd | \"D\" A2 AA F2 FF | \"A\"  E2 EF GF E2 | \"D\" DEFG A2 A2 |\n\"D\" f2 ff d2 dd | \"D\" A2 AA F2 FF | \"A\" E2 EF GF E2 | \"D\" D2 F2 D2 :|\nK:A\n|: cd | e2 ec e2 ec | efed c2 cc | B2 Bc dc B2 | ABcd e2 cd | e2 ec e2 ec | efed c2 c2 | B2 Bc d2 cB | A2 c2 A2 :|"}
{"instruction": "Write a melody in ABC Notation in a specific style.", "input": "Folk", "output": "Here it is the melody in ABC Notation in Folk style:\nX:0\nT: Tha Mi Sgith\nM: 4/4\nL: 1/8\nK: Amin\n|:B| A<A a2 g<e g2| e>dB>A G>AB<G| A<A a2 g<e g2| e>dB>e A/A/A A:| |:g| e>dB>A G>AB<G| e>dB>A B2 B>g| e>dB>A G>AB<d| e>dB>e A/A/A A:| </html>", "prompt": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\n", "text": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\nHere it is the melody in ABC Notation in Folk style:\nX:0\nT: Tha Mi Sgith\nM: 4/4\nL: 1/8\nK: Amin\n|:B| A<A a2 g<e g2| e>dB>A G>AB<G| A<A a2 g<e g2| e>dB>e A/A/A A:| |:g| e>dB>A G>AB<G| e>dB>A B2 B>g| e>dB>A G>AB<d| e>dB>e A/A/A A:| </html>"}
{"instruction": "Write a melody in ABC Notation in a specific style.", "input": "Folk", "output": "Here it is the melody in ABC Notation in Folk style:\nX:0\nT: Cleveland Park\nM: 6/8\nL: 1/8\nK: Dmin\n A,DE FED | B,DE FED | =B,DE FED | (E/2F/2G)E CDE | A,DE FED | B,DE FED | (E/2F/2G)E CEG | GFE D z2 :| DFA dAF | GAB AFD | B,DF A,DF | FED ^CDE | DFA dAF | GAB AFD | B,DF A,DF | ED^C D3 :| A,DE FED | B,DE FED | =B,DE FED | (E/2F/2G)E CDE | A,DE FED | B,DE FED | (E/2F/2G)E CEG | GFE D z2 :| Adf afa | bag afd | Bdf Adf | fed ^cde | Adf a2a | bag afd | Bdf Adf | ed^c d2d | Adf afa | bag afd | Bdf Adf | fed ^cde | Ade fed | Bde fed | ege ceg | gfe d3 :|]", "prompt": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\n", "text": "Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\n\n### Instruction:\nWrite a melody in ABC Notation in a specific style.\n\n### Input:\nFolk\n\n### Response:\nHere it is the melody in ABC Notation in Folk style:\nX:0\nT: Cleveland Park\nM: 6/8\nL: 1/8\nK: Dmin\n A,DE FED | B,DE FED | =B,DE FED | (E/2F/2G)E CDE | A,DE FED | B,DE FED | (E/2F/2G)E CEG | GFE D z2 :| DFA dAF | GAB AFD | B,DF A,DF | FED ^CDE | DFA dAF | GAB AFD | B,DF A,DF | ED^C D3 :| A,DE FED | B,DE FED | =B,DE FED | (E/2F/2G)E CDE | A,DE FED | B,DE FED | (E/2F/2G)E CEG | GFE D z2 :| Adf afa | bag afd | Bdf Adf | fed ^cde | Adf a2a | bag afd | Bdf Adf | ed^c d2d | Adf afa | bag afd | Bdf Adf | fed ^cde | Ade fed | Bde fed | ege ceg | gfe d3 :|]"}

Expected behavior

I'm trying to use QLoRA for fine-tuning llama2-7b-chat-hf for CASUAL_LM.

I am getting the following error:


INFO: fuse: warning: library too old, some operations may not work

==========
== CUDA ==
CUDA Version 11.8.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|█████ | 1/2 [00:29<00:29, 29.73s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:39<00:00, 17.96s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:39<00:00, 19.73s/it]
{
"_name_or_path": "/scratch/LLM/LLAMA2/Llama-2-7b-chat-hf",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pretraining_tp": 1,
"quantization_config": {
"bnb_4bit_compute_dtype": "bfloat16",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": true,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.35.0.dev0",
"use_cache": false,
"vocab_size": 32000
}

0%| | 0/750 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__ method is faster than using a method to encode the text followed by a call to the pad method to get a padded encoding.
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(

0%| | 1/750 [00:07<1:33:01, 7.45s/it]
0%| | 2/750 [00:10<1:03:41, 5.11s/it]
0%| | 3/750 [00:14<53:38, 4.31s/it]
1%| | 4/750 [00:17<48:40, 3.92s/it]
1%| | 5/750 [00:20<44:54, 3.62s/it]
1%| | 6/750 [00:23<42:28, 3.43s/it]
1%| | 7/750 [00:26<40:52, 3.30s/it]
1%| | 8/750 [00:29<39:40, 3.21s/it]
1%| | 9/750 [00:32<38:46, 3.14s/it]
1%|▏ | 10/750 [00:35<38:06, 3.09s/it]
1%|▏ | 11/750 [00:38<37:39, 3.06s/it]
2%|▏ | 12/750 [00:41<37:18, 3.03s/it]
2%|▏ | 13/750 [00:44<37:00, 3.01s/it]
2%|▏ | 14/750 [00:47<36:44, 2.99s/it]
2%|▏ | 15/750 [00:50<36:28, 2.98s/it]
2%|▏ | 16/750 [00:53<36:15, 2.96s/it]
2%|▏ | 17/750 [00:56<36:03, 2.95s/it]
2%|▏ | 18/750 [00:59<35:52, 2.94s/it]
3%|▎ | 19/750 [01:02<35:37, 2.92s/it]
3%|▎ | 20/750 [01:05<35:03, 2.88s/it]
3%|▎ | 21/750 [01:07<34:07, 2.81s/it]
3%|▎ | 22/750 [01:10<32:55, 2.71s/it]
3%|▎ | 23/750 [01:12<32:05, 2.65s/it]
3%|▎ | 24/750 [01:15<31:28, 2.60s/it]
3%|▎ | 25/750 [01:17<30:59, 2.57s/it]

{'loss': 1.7484, 'learning_rate': 0.0002, 'epoch': 0.1}

3%|▎ | 25/750 [01:17<30:59, 2.57s/it]Traceback (most recent call last):
File "/home/andre/ondemand/data/sys/myjobs/projects/default/4/finetuning.py", line 92, in
configured_trainer.train()
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1506, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1869, in _inner_training_loop
self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2224, in _maybe_log_save_evaluate
self._save_checkpoint(model, trial, metrics=metrics)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2281, in _save_checkpoint
self.save_model(output_dir, _internal_call=True)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2768, in save_model
self._save(output_dir)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2831, in _save
self.tokenizer.save_pretrained(output_dir)
File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2445, in save_pretrained
out_str = json.dumps(tokenizer_config, indent=2, sort_keys=True, ensure_ascii=False) + "\n"
File "/usr/lib/python3.10/json/init.py", line 238, in dumps
**kw).encode(obj)
File "/usr/lib/python3.10/json/encoder.py", line 201, in encode
chunks = list(chunks)
File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type BitsAndBytesConfig is not JSON serializable

This error happened when I started using the parameter gradient_checkpointing=True in the TrainingArguments(). So in the step when saving the checkpoint (in our code example the step 25, because save_steps=25), it gives the error:

TypeError: Object of type BitsAndBytesConfig is not JSON serializable

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

younesbelkada commented 12 months ago

hi @andreducfer do you still face this issue with the latest transformers version? pip install -U transformers

andreducfer commented 11 months ago

Hi @younesbelkada I did a Test with the Transformers version 4.36.0.dev0 and I am still facing the same problems. The log is attached. slurm.log

ArthurZucker commented 11 months ago

The issue is that you are passing the quantization config to the tokenizer?

    tokenizer = AutoTokenizer.from_pretrained(
        MODEL,
        cache_dir=MODEL,
        token=TOKEN_HF,
        device_map="auto",
        quantization_config=bnb_config
    )

the error tracebacks to the serialization of the tokenizer by saying:

File "/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py", line 2445, in save_pretrained
out_str = json.dumps(tokenizer_config, indent=2, sort_keys=True, ensure_ascii=False) + "\n"

and

TypeError: Object of type BitsAndBytesConfig is not JSON serializable

just don't pass it to the tokenizers.

github-actions[bot] commented 10 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

NandaKishoreJoshi commented 3 months ago

Hi, Was this issue solved. I'm getting similar error while using SFTTrainer from trl==0.9.6 and transformers==4.44.0

andreducfer commented 2 months ago

Hi, Was this issue solved. I'm getting similar error while using SFTTrainer from trl==0.9.6 and transformers==4.44.0

The solution given by @ArthurZucker worked for me. Don't pass the quantization config to the tokenizer.