unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Apache License 2.0
13.05k stars 853 forks source link

I used a 2060 graphics card and reported an error "Feature 'cvt with.f32.BF16 'requires.target sm_80 or higher". #434

Open yangcecode opened 2 months ago

yangcecode commented 2 months ago

==((====))== Unsloth: Fast Llama patching release 2024.4 \ /| GPU: NVIDIA GeForce RTX 2060 SUPER. Max memory: 7.785 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.5. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False. "-____-" Free Apache license: http://github.com/unslothai/unsloth Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.40s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /home/chuhaitong/yangce/Meta-Llama-3-8B-Instruct does not have a padding or unknown token! Will use the EOS token of id 128001 as padding. Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none). max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1 | Num Epochs = 60 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 60 "-__-" Number of trainable parameters = 41,943,040 0%| | 0/60 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/chuhaitong/yangce/app.py", line 114, in trainer_stats = trainer.train() File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "", line 361, in _fast_inner_training_loop File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call__ return convert_to_fp32(self.model_forward(*args, kwargs)) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward return self.base_model( File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(*args, *kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward outputs = self.model( File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 650, in LlamaModel_fast_forward hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply( File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply return super().apply(*args, kwargs) # type: ignore[misc] File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd return fwd(*args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/_utils.py", line 333, in forward (output,) = forward_function(hidden_states, args) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm out = Fast_RMS_Layernorm.apply(X, W, eps, gemma) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply return super().apply(*args, *kwargs) # type: ignore[misc] File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward fx[(n_rows,)]( File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in return lambda args, kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run self.cache[device][key] = compile( File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile next_module = compile_ir(module, metadata) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability) File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion) RuntimeError: Internal Triton PTX codegen error: ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors

seancarmod-y commented 2 months ago

I get the same when I run this on a v100. I thought setting bf16 to false should solve this.

import os from unsloth import FastLanguageModel import torch from trl import SFTTrainer from transformers import TrainingArguments from datasets import load_dataset

max_seq_length = 1024 dataset_folder = "./datasets/train_dataset" dataset = load_dataset(dataset_folder, split="train")

Load Llama3 model

model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/llama-3-8b-bnb-4bit", max_seq_length=max_seq_length, dtype=None, load_in_4bit=True, )

Model patching and add fast LoRA weights and training

model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha=16, lora_dropout=0, # Supports any, but = 0 is optimized bias="none", # Supports any, but = "none" is optimized use_gradient_checkpointing=True, random_state=3407, max_seq_length=max_seq_length, use_rslora=False, # Rank stabilized LoRA loftq_config=None, # LoftQ )

trainer = SFTTrainer( model=model, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, tokenizer=tokenizer, args=TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=10, max_steps=150, learning_rate=2e-4, fp16=True, bf16=False, logging_steps=1, output_dir="outputs", optim="adamw_8bit", seed=3407, ), )

Show current memory stats

gpu_stats = torch.cuda.get_device_properties(0) start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") print(f"{start_gpu_memory} GB of memory reserved.")

trainer_stats = trainer.train()

Show final memory and time stats

used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) used_memory_for_lora = round(used_memory - start_gpu_memory, 3) used_percentage = round(used_memory /max_memory100, 3) lora_percentage = round(used_memory_for_lora/max_memory100, 3) print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.") print(f"Peak reserved memory = {used_memory} GB.") print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") print(f"Peak reserved memory % of max memory = {used_percentage} %.") print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

Save the model

model.save_pretrained("llama3_lora_model") model.save_pretrained_merged("outputs", tokenizer, save_method="merged_16bit",)

Save to 8bit Q8_0 and q4

model.save_pretrained_gguf("llama3_model_q8", tokenizer,) model.save_pretrained_gguf("llama3_model_q4", tokenizer, quantization_method="q4_k_m")

Error: (unsloth_env) root@sean:/home/sean# python unsloth_llama3_fine_tune.py /opt/conda/envs/unsloth_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn( ==((====))== Unsloth: Fast Llama patching release 2024.4 \ /| GPU: Tesla V100-SXM2-16GB. Max memory: 15.773 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.0. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False. "--" Free Apache license: http://github.com/unslothai/unsloth Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers. Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 13,533 | Num Epochs = 1 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 150 "--" Number of trainable parameters = 41,943,040 0%| | 0/150 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/sean/unsloth_llama3_fine_tune.py", line 65, in trainer_stats = trainer.train() File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train output = super().train(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "", line 361, in _fast_inner_training_loop File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step loss = self.compute_loss(model, inputs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss outputs = model(inputs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward return model_forward(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call return convert_to_fp32(self.model_forward(*args, kwargs)) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward return self.base_model( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward outputs = self.model( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 668, in LlamaModel_fast_forward layer_outputs = torch.utils.checkpoint.checkpoint( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn return fn(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner return fn(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint return CheckpointFunction.apply(function, preserve, args) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply return super().apply(args, kwargs) # type: ignore[misc] File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward outputs = run_function(args) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 664, in custom_forward return module(inputs, past_key_value, output_attentions, padding_mask = padding_mask) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm out = Fast_RMS_Layernorm.apply(X, W, eps, gemma) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply return super().apply(*args, *kwargs) # type: ignore[misc] File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward fx[(n_rows,)]( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in return lambda args, kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run self.cache[device][key] = compile( File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile next_module = compile_ir(module, metadata) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability) File "/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion) RuntimeError: Internal Triton PTX codegen error: ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 350; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 354; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 358; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 362; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 366; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 370; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 374; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 378; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 382; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 386; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 390; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 394; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 398; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 402; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 406; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher ptxas /tmp/compile-ptx-src-3c809d, line 410; error : Feature '.bf16' requires .target sm_80 or higher ptxas fatal : Ptx assembly aborted due to errors

ludekcizinsky commented 2 months ago

I had the exact same problem using torch 2.3.0. As you said, even said the flag for bf16 to False did not work.

I resolved the issue by downgrading to torch 2.2.0 and installing the unsloth using:

pip install --upgrade --force-reinstall --no-cache-dir torch==2.2.0 triton \
  --index-url https://download.pytorch.org/whl/cu121

pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git"