Compilation failure during trainer_stats = trainer.train()

DenizK7 commented 5 months ago

Description

I encountered an error while trying to fine-tune the llama3 model using unsloth. The error occurs during the trainer.train() step, and it appears to be related to a missing Python.h header file and a compilation failure. Below are the relevant error messages and system details.

Error Messages

/tmp/tmppap3l7o5/main.c:4:10: fatal error: Python.h: No such file or directory
```
4 | #include <Python.h>
  |          ^~~~~~~~~~
compilation terminated.
```
CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmppap3l7o5/main.c', '-O3', '-I/home/deniz/myenv/lib/python3.9/site-packages/triton/common/../third_party/cuda/include', '-I/usr/include/python3.9', '-I/tmp/tmppap3l7o5', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmppap3l7o5/_rms_layernorm_forward.cpython-39-x86_64-linux-gnu.so', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu', '-L/lib/x86_64-linux-gnu', '-L/lib/i386-linux-gnu']' returned non-zero exit status 1.

System Details

Python Version: 3.9
CUDA Version: 12.1
Pytorch version: 2.2.2+cu121
GPU Model: Tesla V100-PCIE-16GB
Operating System: Ubuntu 20.04 LTS

Steps to Reproduce

Install dependencies:

pip install unsloth torch trl transformers

Run the following code to initiate fine-tuning:

from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

max_seq_length = 2048
dtype = None  # None for auto detection.
load_in_4bit = False  # Disable 4bit quantization to avoid using Flash Attention v2.

model_name = "unsloth/llama-3-8b"

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

dataset = load_dataset("yahma/alpaca-cleaned", split="train")
dataset = dataset.map(formatting_prompts_func, batched=True)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=True,
        bf16=False,
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

trainer_stats = trainer.train()

Troubleshooting Steps Taken

Ensured Python development headers are installed:
```
sudo apt-get install python3-dev
```
Verified CUDA and cuDNN installations.
Updated gcc to the latest version.

Despite these steps, the error persists. Does anyone can help ?

danielhanchen commented 5 months ago

How about

sudo apt-get install python3.9-dev
sudo apt install libpython3.9-dev

Also you need to link it so ldconfig

DenizK7 commented 5 months ago

How about
sudo apt-get install python3.9-dev
sudo apt install libpython3.9-dev
Also you need to link it so ldconfig

I tried what you said before, unfortunately nothing changed.

Eleanorkong commented 3 weeks ago

Hi, could you please suggest what I should do as well? got same problem thank you also use python 3.9.18

unslothai / unsloth