Closed brainchen2020 closed 1 month ago
Oh no stalling is very bad - it probably means something in the GPU itself is going haywire - does this often or rarely?
Oh no stalling is very bad - it probably means something in the GPU itself is going haywire - does this often or rarely?
This is the case every time, and each time it gets stuck at step 11.
here is the test code: LLama_3_1.py.txt
Strange, I replaced above test code with the following code and it worked!
from unsloth import FastLanguageModel
from unsloth import is_bfloat16_supported
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = None,
load_in_4bit = True,
)
# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
model,
r = 16,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_seq_length,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
trainer = SFTTrainer(
model = model,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
tokenizer = tokenizer,
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
warmup_steps = 10,
max_steps = 60,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
output_dir = "outputs",
optim = "adamw_8bit",
seed = 3407,
),
)
trainer.train()
# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates
@brainchen2020 Sorry on the delay! Ok weird hmm - it could be a dataset tokenization issue maybe going out of bounds
doesn't happen again after changing the code, close now
As shown in the image, at the 11 step of training, CUDA is inactive. I've tried several times, and it always gets stuck like this. Even after waiting for more than ten minutes, there is no progress.
code base: Llama 3.1 (8B) env: Windows 11 WSL2 2080ti
unsloth 2024.8 xformers 0.0.24 transformers 4.44.2 triton 2.2.0 torch 2.2.0 accelerate 0.34.2 bitsandbytes 0.43.3 peft 0.12.0
Hi~ I had the same problem! and my GPU is 2080Ti 22G too, I have tried your new code, but it seem doesn't work, did you finally find out why?
As shown in the image, at the 11 step of training, CUDA is inactive. I've tried several times, and it always gets stuck like this. Even after waiting for more than ten minutes, there is no progress. code base: Llama 3.1 (8B) env: Windows 11 WSL2 2080ti unsloth 2024.8 xformers 0.0.24 transformers 4.44.2 triton 2.2.0 torch 2.2.0 accelerate 0.34.2 bitsandbytes 0.43.3 peft 0.12.0
Hi~ I had the same problem! and my GPU is 2080Ti 22G too, I have tried your new code, but it seem doesn't work, did you finally find out why?
And strangely enough, I'm stuck at step 11 too🤣
As shown in the image, at the 11 step of training, CUDA is inactive. I've tried several times, and it always gets stuck like this. Even after waiting for more than ten minutes, there is no progress.
code base: Llama 3.1 (8B) env: Windows 11 WSL2 2080ti
unsloth 2024.8 xformers 0.0.24 transformers 4.44.2 triton 2.2.0 torch 2.2.0 accelerate 0.34.2 bitsandbytes 0.43.3 peft 0.12.0