unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.4k stars 1.29k forks source link

`triton.language' has no attribute cast` [FIXED] #1263

Open arianyambao opened 2 weeks ago

arianyambao commented 2 weeks ago

Been using unsloth for my trainings, however updating the version resulted to an error.

Tried fine-tuning and got this error result:

2024-11-07T16:22:42.743116754Z ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
2024-11-07T16:22:42.743172138Z    \\   /|    Num examples = 854 | Num Epochs = 6
2024-11-07T16:22:42.743186385Z O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
2024-11-07T16:22:42.743198119Z \        /    Total batch size = 16 | Total steps = 318
2024-11-07T16:22:42.743209991Z  "-____-"     Number of trainable parameters = 41,943,040
2024-11-07T16:22:44.764922932Z 
  0%|          | 0/318 [00:00<?, ?it/s]Traceback (most recent call last):
2024-11-07T16:22:44.764960716Z   File "/workspace/finetune_genllm_8b.py", line 242, in <module>
2024-11-07T16:22:44.764966932Z     trainer.train()
2024-11-07T16:22:44.764973078Z   File "<string>", line 156, in train
2024-11-07T16:22:44.764979224Z   File "<string>", line 380, in _fast_inner_training_loop
2024-11-07T16:22:44.764986767Z   File "<string>", line 31, in _unsloth_training_step
2024-11-07T16:22:44.764993891Z   File "/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py", line 970, in _unsloth_pre_compute_loss
2024-11-07T16:22:44.764999548Z     return self._old_compute_loss(model, inputs, *args, **kwargs)
2024-11-07T16:22:44.765005764Z   File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 3633, in compute_loss
2024-11-07T16:22:44.765011840Z     outputs = model(**inputs)
2024-11-07T16:22:44.765017986Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
2024-11-07T16:22:44.765024132Z     return self._call_impl(*args, **kwargs)
2024-11-07T16:22:44.765030348Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
2024-11-07T16:22:44.765036424Z     return forward_call(*args, **kwargs)
2024-11-07T16:22:44.765058703Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 823, in forward
2024-11-07T16:22:44.765065268Z     return model_forward(*args, **kwargs)
2024-11-07T16:22:44.765071973Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/operations.py", line 811, in __call__
2024-11-07T16:22:44.765078049Z     return convert_to_fp32(self.model_forward(*args, **kwargs))
2024-11-07T16:22:44.765081890Z   File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
2024-11-07T16:22:44.765086570Z     return func(*args, **kwargs)
2024-11-07T16:22:44.765091319Z   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner
2024-11-07T16:22:44.765095649Z     return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
2024-11-07T16:22:44.765100328Z   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
2024-11-07T16:22:44.765106893Z     return fn(*args, **kwargs)
2024-11-07T16:22:44.765111712Z   File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 1046, in PeftModelForCausalLM_fast_forward
2024-11-07T16:22:44.765117370Z     return self.base_model(
2024-11-07T16:22:44.765121630Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
2024-11-07T16:22:44.765126309Z     return self._call_impl(*args, **kwargs)
2024-11-07T16:22:44.765130639Z   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
2024-11-07T16:22:44.765137763Z     return forward_call(*args, **kwargs)
2024-11-07T16:22:44.765142373Z   File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 197, in forward
2024-11-07T16:22:44.765148589Z     return self.model.forward(*args, **kwargs)
2024-11-07T16:22:44.765154315Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 170, in new_forward
2024-11-07T16:22:44.765159414Z     output = module._old_forward(*args, **kwargs)
2024-11-07T16:22:44.765164652Z   File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 987, in _CausalLM_fast_forward
2024-11-07T16:22:44.765169401Z     loss = fast_cross_entropy_loss(
2024-11-07T16:22:44.765173661Z   File "/usr/local/lib/python3.10/dist-packages/torch/_compile.py", line 24, in inner
2024-11-07T16:22:44.765179388Z     return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
2024-11-07T16:22:44.765185534Z   File "/usr/local/lib/python3.10/dist-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
2024-11-07T16:22:44.765190703Z     return fn(*args, **kwargs)
2024-11-07T16:22:44.765195033Z   File "/usr/local/lib/python3.10/dist-packages/unsloth/kernels/cross_entropy_loss.py", line 387, in fast_cross_entropy_loss
2024-11-07T16:22:44.765201528Z     loss = Fast_CrossEntropyLoss.apply(
2024-11-07T16:22:44.765207255Z   File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply
2024-11-07T16:22:44.765214309Z     return super().apply(*args, **kwargs)  # type: ignore[misc]
2024-11-07T16:22:44.765218569Z   File "/usr/local/lib/python3.10/dist-packages/unsloth/kernels/cross_entropy_loss.py", line 309, in forward
2024-11-07T16:22:44.765222830Z     _chunked_cross_entropy_forward[(n_rows, n_chunks,)](
2024-11-07T16:22:44.765227649Z   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/autotuner.py", line 232, in run
2024-11-07T16:22:44.765232328Z     return self.fn.run(*args, **kwargs)
2024-11-07T16:22:44.765237077Z   File "<string>", line 63, in _chunked_cross_entropy_forward
2024-11-07T16:22:44.765241268Z   File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 430, in compile
2024-11-07T16:22:44.765246017Z     fn_cache_manager = get_cache_manager(make_hash(fn, arch, **kwargs))
2024-11-07T16:22:44.765250277Z   File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 253, in make_hash
2024-11-07T16:22:44.765255026Z     key = f"{fn.cache_key}-{''.join(signature.values())}-{configs_key}-{constants}-{num_warps}-{num_stages}-{debug}-{arch}"
2024-11-07T16:22:44.765259706Z   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 445, in cache_key
2024-11-07T16:22:44.765265921Z     dependencies_finder.visit(self.parse())
2024-11-07T16:22:44.765270671Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765275350Z     return visitor(node)
2024-11-07T16:22:44.765281077Z   File "/usr/lib/python3.10/ast.py", line 426, in generic_visit
2024-11-07T16:22:44.765285337Z     self.visit(item)
2024-11-07T16:22:44.765290505Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765294766Z     return visitor(node)
2024-11-07T16:22:44.765299026Z   File "/usr/lib/python3.10/ast.py", line 426, in generic_visit
2024-11-07T16:22:44.765303775Z     self.visit(item)
2024-11-07T16:22:44.765308455Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765312785Z     return visitor(node)
2024-11-07T16:22:44.765317534Z   File "/usr/lib/python3.10/ast.py", line 428, in generic_visit
2024-11-07T16:22:44.765322213Z     self.visit(value)
2024-11-07T16:22:44.765326543Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765331223Z     return visitor(node)
2024-11-07T16:22:44.765335483Z   File "/usr/lib/python3.10/ast.py", line 428, in generic_visit
2024-11-07T16:22:44.765339743Z     self.visit(value)
2024-11-07T16:22:44.765344004Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765348683Z     return visitor(node)
2024-11-07T16:22:44.765353432Z   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 77, in visit_Call
2024-11-07T16:22:44.765358181Z     func = self.visit(node.func)
2024-11-07T16:22:44.765362442Z   File "/usr/lib/python3.10/ast.py", line 418, in visit
2024-11-07T16:22:44.765366702Z     return visitor(node)
2024-11-07T16:22:44.765371381Z   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 74, in visit_Attribute
2024-11-07T16:22:44.765376200Z     return getattr(lhs, node.attr)
2024-11-07T16:22:44.765380391Z AttributeError: module 'triton.language' has no attribute 'cast'. Did you mean: 'cat'?
arianyambao commented 2 weeks ago

.A quick fix I used is to install unsloth from a previously working branch version:

pip --no-cache-dir install "unsloth[cu118-ampere] @ git+https://github.com/unslothai/unsloth.git@a2f8db3e7341f983af5814a2c56f54fa29ee548d"

And it worked however you need to specify

os.environ["UNSLOTH_IS_PRESENT"] = "1"

which was a related issue from !1252

danielhanchen commented 2 weeks ago

Oh maybe this is an old triton version - I will add a flag to turn the casting off!

sureshmol commented 2 weeks ago

Even i am facing the same issue. Please help me with a resolution

Ar9av commented 1 week ago

Yes, getting same error Tried the following, ended up getting more issues

.A quick fix I used is to install unsloth from a previously working branch version:

pip --no-cache-dir install "unsloth[cu118-ampere] @ git+https://github.com/unslothai/unsloth.git@a2f8db3e7341f983af5814a2c56f54fa29ee548d"

And it worked however you need to specify

os.environ["UNSLOTH_IS_PRESENT"] = "1"

which was a related issue from !1252

jonwolds commented 1 week ago

Same issue here

danielhanchen commented 1 week ago

@sureshmol @arianyambao @Ar9av @jonwolds Apologies everyone - I added a temporary solution in the nightly branch - would it possible for you guys to test to see if it works - thanks a lot - also apologies on the issue! pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@nightly"

ex-yanminmin001 commented 1 week ago

@sureshmol @arianyambao @Ar9av @jonwolds Apologies everyone - I added a temporary solution in the nightly branch - would it possible for you guys to test to see if it works - thanks a lot - also apologies on the issue! pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@nightly"

解决了 谢谢!

opertifelipe commented 1 week ago

I am having the same issue

arianyambao commented 1 week ago

@sureshmol @arianyambao @Ar9av @jonwolds Apologies everyone - I added a temporary solution in the nightly branch - would it possible for you guys to test to see if it works - thanks a lot - also apologies on the issue! pip uninstall unsloth -y && pip install --upgrade --no-cache-dir --no-deps "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git@nightly"

Hi, @danielhanchen it works by default and without the flags I explicitly set last time. However, I built it with cu118-ampere instead of colab-new:

pip --no-cache-dir install "unsloth[cu118-ampere] @ git+https://github.com/unslothai/unsloth.git@nightly"

Here's a short output:

2024-11-11T02:56:35.050957027-08:00 
Map:   0%|          | 0/854 [00:00<?, ? examples/s]
Map: 100%|██████████| 854/854 [00:00<00:00, 59543.12 examples/s]
2024-11-11T02:56:35.823763879-08:00 ==((====))==  Unsloth 2024.11.5: Fast Llama patching. Transformers = 4.46.2.
2024-11-11T02:56:35.823787625-08:00    \\   /|    GPU: NVIDIA RTX A6000. Max memory: 47.536 GB. Platform = Linux.
2024-11-11T02:56:35.823794190-08:00 O^O/ \_/ \    Pytorch: 2.1.0+cu118. CUDA = 8.6. CUDA Toolkit = 11.8.
2024-11-11T02:56:35.823799358-08:00 \        /    Bfloat16 = TRUE. FA [Xformers = 0.0.22.post7+cu118. FA2 = True]
2024-11-11T02:56:35.823804177-08:00  "-____-"     Free Apache license: http://github.com/unslothai/unsloth
2024-11-11T02:56:35.823809275-08:00 Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
2024-11-11T02:59:02.909143520-08:00 Unsloth 2024.11.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
2024-11-11T02:59:05.118042752-08:00 
Map (num_proc=2):   0%|          | 0/854 [00:00<?, ? examples/s]
Map (num_proc=2):  50%|█████     | 427/854 [00:00<00:00, 556.73 examples/s]
Map (num_proc=2): 100%|██████████| 854/854 [00:00<00:00, 1070.49 examples/s]
Map (num_proc=2): 100%|██████████| 854/854 [00:01<00:00, 836.49 examples/s]
2024-11-11T02:59:05.209715063-08:00 Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
2024-11-11T02:59:06.223910137-08:00 ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
2024-11-11T02:59:06.223946594-08:00    \\   /|    Num examples = 854 | Num Epochs = 6
2024-11-11T02:59:06.223954626-08:00 O^O/ \_/ \    Batch size per device = 4 | Gradient Accumulation steps = 4
2024-11-11T02:59:06.223959305-08:00 \        /    Total batch size = 16 | Total steps = 318
2024-11-11T02:59:06.223962727-08:00  "-____-"     Number of trainable parameters = 41,943,040
2024-11-11T02:59:11.991167966-08:00 
  0%|          | 0/318 [00:00<?, ?it/s]
  0%|          | 1/318 [00:05<27:22,  5.18s/it]

Thank you!

danielhanchen commented 1 week ago

@arianyambao Oh ok glad it works! @opertifelipe Did you try updating Unsloth? pip install --upgrade --no-cache-dir --no-deps unsloth

opertifelipe commented 1 week ago

@arianyambao Oh ok glad it works!

@opertifelipe Did you try updating Unsloth? pip install --upgrade --no-cache-dir --no-deps unsloth

Yes, now it works! 😀