unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.24k stars 1.27k forks source link

Compilation failure during trainer_stats = trainer.train() #542

Open DenizK7 opened 5 months ago

DenizK7 commented 5 months ago

Description

I encountered an error while trying to fine-tune the llama3 model using unsloth. The error occurs during the trainer.train() step, and it appears to be related to a missing Python.h header file and a compilation failure. Below are the relevant error messages and system details.

Error Messages

  1. /tmp/tmppap3l7o5/main.c:4:10: fatal error: Python.h: No such file or directory

    4 | #include <Python.h>
      |          ^~~~~~~~~~
    compilation terminated.
  2. CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmppap3l7o5/main.c', '-O3', '-I/home/deniz/myenv/lib/python3.9/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.9', '-I/tmp/tmppap3l7o5', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmppap3l7o5/_rms_layernorm_forward.cpython-39-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.

System Details

Steps to Reproduce

  1. Install dependencies:

    pip install unsloth torch trl transformers
  2. Run the following code to initiate fine-tuning:

    from unsloth import FastLanguageModel
    import torch
    from datasets import load_dataset
    from trl import SFTTrainer
    from transformers import TrainingArguments
    from unsloth import is_bfloat16_supported
    
    max_seq_length = 2048
    dtype = None  # None for auto detection.
    load_in_4bit = False  # Disable 4bit quantization to avoid using Flash Attention v2.
    
    model_name = "unsloth/llama-3-8b"
    
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name=model_name,
        max_seq_length=max_seq_length,
        dtype=dtype,
        load_in_4bit=load_in_4bit,
    )
    
    model = FastLanguageModel.get_peft_model(
        model,
        r=16,
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj"],
        lora_alpha=16,
        lora_dropout=0,
        bias="none",
        use_gradient_checkpointing="unsloth",
        random_state=3407,
        use_rslora=False,
        loftq_config=None,
    )
    
    alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
    
    ### Instruction:
    {}
    
    ### Input:
    {}
    
    ### Response:
    {}"""
    
    EOS_TOKEN = tokenizer.eos_token
    
    def formatting_prompts_func(examples):
        instructions = examples["instruction"]
        inputs = examples["input"]
        outputs = examples["output"]
        texts = []
        for instruction, input, output in zip(instructions, inputs, outputs):
            text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
            texts.append(text)
        return {"text": texts}
    
    dataset = load_dataset("yahma/alpaca-cleaned", split="train")
    dataset = dataset.map(formatting_prompts_func, batched=True)
    
    trainer = SFTTrainer(
        model=model,
        tokenizer=tokenizer,
        train_dataset=dataset,
        dataset_text_field="text",
        max_seq_length=max_seq_length,
        dataset_num_proc=2,
        packing=False,
        args=TrainingArguments(
            per_device_train_batch_size=2,
            gradient_accumulation_steps=4,
            warmup_steps=5,
            max_steps=60,
            learning_rate=2e-4,
            fp16=True,
            bf16=False,
            logging_steps=1,
            optim="adamw_8bit",
            weight_decay=0.01,
            lr_scheduler_type="linear",
            seed=3407,
            output_dir="outputs",
        ),
    )
    
    trainer_stats = trainer.train()

Troubleshooting Steps Taken

  1. Ensured Python development headers are installed:

    sudo apt-get install python3-dev
  2. Verified CUDA and cuDNN installations.

  3. Updated gcc to the latest version.

Despite these steps, the error persists. Does anyone can help ?

danielhanchen commented 5 months ago

How about

sudo apt-get install python3.9-dev
sudo apt install libpython3.9-dev

Also you need to link it so ldconfig

DenizK7 commented 5 months ago

How about

sudo apt-get install python3.9-dev
sudo apt install libpython3.9-dev

Also you need to link it so ldconfig

I tried what you said before, unfortunately nothing changed.

Eleanorkong commented 3 weeks ago

Hi, could you please suggest what I should do as well? got same problem thank you also use python 3.9.18