Open yangcecode opened 2 months ago
I get the same when I run this on a v100. I thought setting bf16 to false should solve this.
import os from unsloth import FastLanguageModel import torch from trl import SFTTrainer from transformers import TrainingArguments from datasets import load_dataset
max_seq_length = 1024 dataset_folder = "./datasets/train_dataset" dataset = load_dataset(dataset_folder, split="train")
model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/llama-3-8b-bnb-4bit", max_seq_length=max_seq_length, dtype=None, load_in_4bit=True, )
model = FastLanguageModel.get_peft_model( model, r=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha=16, lora_dropout=0, # Supports any, but = 0 is optimized bias="none", # Supports any, but = "none" is optimized use_gradient_checkpointing=True, random_state=3407, max_seq_length=max_seq_length, use_rslora=False, # Rank stabilized LoRA loftq_config=None, # LoftQ )
trainer = SFTTrainer( model=model, train_dataset=dataset, dataset_text_field="text", max_seq_length=max_seq_length, tokenizer=tokenizer, args=TrainingArguments( per_device_train_batch_size=2, gradient_accumulation_steps=4, warmup_steps=10, max_steps=150, learning_rate=2e-4, fp16=True, bf16=False, logging_steps=1, output_dir="outputs", optim="adamw_8bit", seed=3407, ), )
gpu_stats = torch.cuda.get_device_properties(0) start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3) print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.") print(f"{start_gpu_memory} GB of memory reserved.")
trainer_stats = trainer.train()
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3) used_memory_for_lora = round(used_memory - start_gpu_memory, 3) used_percentage = round(used_memory /max_memory100, 3) lora_percentage = round(used_memory_for_lora/max_memory100, 3) print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.") print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.") print(f"Peak reserved memory = {used_memory} GB.") print(f"Peak reserved memory for training = {used_memory_for_lora} GB.") print(f"Peak reserved memory % of max memory = {used_percentage} %.") print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")
model.save_pretrained("llama3_lora_model") model.save_pretrained_merged("outputs", tokenizer, save_method="merged_16bit",)
model.save_pretrained_gguf("llama3_model_q8", tokenizer,) model.save_pretrained_gguf("llama3_model_q4", tokenizer, quantization_method="q4_k_m")
Error:
(unsloth_env) root@sean:/home/sean# python unsloth_llama3_fine_tune.py
/opt/conda/envs/unsloth_env/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
warnings.warn(
==((====))== Unsloth: Fast Llama patching release 2024.4
\ /| GPU: Tesla V100-SXM2-16GB. Max memory: 15.773 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.0. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"--" Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
max_steps is given, it will override any value given in num_trainepochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\ /| Num examples = 13,533 | Num Epochs = 1
O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 150
"--" Number of trainable parameters = 41,943,040
0%| | 0/150 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/sean/unsloth_llama3_fine_tune.py", line 65, in
I had the exact same problem using torch 2.3.0. As you said, even said the flag for bf16 to False
did not work.
I resolved the issue by downgrading to torch 2.2.0 and installing the unsloth
using:
pip install --upgrade --force-reinstall --no-cache-dir torch==2.2.0 triton \
--index-url https://download.pytorch.org/whl/cu121
pip install "unsloth[cu121-torch220] @ git+https://github.com/unslothai/unsloth.git"
==((====))== Unsloth: Fast Llama patching release 2024.4 \ /| GPU: NVIDIA GeForce RTX 2060 SUPER. Max memory: 7.785 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.0. CUDA = 7.5. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False. "-____-" Free Apache license: http://github.com/unslothai/unsloth Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.40s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /home/chuhaitong/yangce/Meta-Llama-3-8B-Instruct does not have a padding or unknown token! Will use the EOS token of id 128001 as padding. Unsloth 2024.4 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
True
Using the
trainer_stats = trainer.train()
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "", line 361, in _fast_inner_training_loop
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3138, in training_step
loss = self.compute_loss(model, inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 3161, in compute_loss
outputs = model( inputs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in call__
return convert_to_fp32(self.model_forward(*args, kwargs))
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, *kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 882, in PeftModelForCausalLM_fast_forward
return self.base_model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, *kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 813, in _CausalLM_fast_forward
outputs = self.model(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, *kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 650, in LlamaModel_fast_forward
hidden_states = Unsloth_Offloaded_Gradient_Checkpointer.apply(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 115, in decorate_fwd
return fwd(*args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/_utils.py", line 333, in forward
(output,) = forward_function(hidden_states, args)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, *kwargs) # type: ignore[misc]
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
fx[(n_rows,)](
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 167, in
return lambda args, kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 416, in run
self.cache[device][key] = compile(
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 193, in compile
next_module = compile_ir(module, metadata)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 201, in
stages["cubin"] = lambda src, metadata: self.make_cubin(src, metadata, options, self.capability)
File "/home/chuhaitong/anaconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/backends/cuda.py", line 194, in make_cubin
return compile_ptx_to_cubin(src, ptxas, capability, opt.enable_fp_fusion)
RuntimeError: Internal Triton PTX codegen error:
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 100; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 102; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 104; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 106; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 108; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 110; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 112; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 114; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 116; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 118; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 120; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 122; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 124; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 126; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 128; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 130; error : Feature 'cvt with .f32.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 316; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 318; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 320; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 322; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 324; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 326; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 328; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 330; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 332; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 334; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 336; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 338; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 340; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 342; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 344; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 346; error : Feature 'cvt.bf16.f32' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 350; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 354; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 358; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 362; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 366; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 370; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 374; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 378; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 382; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 386; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 390; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 394; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 398; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 402; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 406; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas /tmp/compile-ptx-src-d2fe88, line 410; error : Feature '.bf16' requires .target sm_80 or higher
ptxas fatal : Ptx assembly aborted due to errors
WANDB_DISABLED
environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none). max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1 | Num Epochs = 60 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 60 "-__-" Number of trainable parameters = 41,943,040 0%| | 0/60 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/chuhaitong/yangce/app.py", line 114, in