unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
16.27k stars 1.12k forks source link

install on databricks #475

Open Jackie0601zhou opened 4 months ago

Jackie0601zhou commented 4 months ago

How can I install unsloth on databricks notebook? I tried "pip install "unsloth[cu121-ampere-torch220] @ git+https://github.com/unslothai/unsloth.git" and I met: Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-install-rst_tpkk/unsloth_e8849fa753954ad5b20ad0a81efbd0be Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-install-rst_tpkk/unsloth_e8849fa753954ad5b20ad0a81efbd0be fatal: unable to access 'https://github.com/unslothai/unsloth.git/': gnutls_handshake() failed: The TLS connection was non-properly terminated. error: subprocess-exited-with-error

danielhanchen commented 4 months ago

That seems like internet access does not work maybe?

Jackie0601zhou commented 4 months ago

I used the HTTPS url and choose github as gitprovider when I added a repo to databricks. I also installed specific version of various packages. All the steps before trainer_stats = trainer.train() were good. But when I run trainer_stats = trainer.train(), it show: 屏幕截图 2024-05-19 230637 屏幕截图 2024-05-19 230654

jvhuang1786 commented 4 months ago

Currently getting the same issue as Jackie. I'm in a more regulated environment for databricks so I have to first download the repo and install through volumes. I have a suspicion that it's a dependency conflict but not too sure where to start looking.

KwesiD commented 4 months ago

I'm also getting the same issue. I've tried installing different versions of the packages, but I end up with the same error.

image image

danielhanchen commented 3 months ago

Hmmm wait is databricks using MLFlow?

KwesiD commented 3 months ago

Yes. By default Databricks logs the runs with MLFlow.

danielhanchen commented 3 months ago

Hmmm ok - oh also is Databricks multi GPU?

KwesiD commented 3 months ago

In my instance, I'm only using a single GPU. It's possible to set up a multi GPU cluster, though.

danielhanchen commented 3 months ago

Hmmm tbh I haven't tried Databricks so I can't exactly debug it - I'll see what I can do, but can't promise anything sorrty

julianmukaj commented 1 month ago

Wondering if any progress made on this? We are facing same issue, trying to install from source and everything is okay till you hit trainer.train() and it fails with segmentation fault.

danielhanchen commented 1 month ago

Oh no a segfault?? :(

julianmukaj commented 1 month ago

Think I have it tracked down to tensorboard, which means its most likely a databricks runtime fix not a unsloth one..

Fatal Python error: Segmentation fault

Thread 0x00007fe7c01f2640 (most recent call first):
  File "/usr/lib/python3.11/threading.py", line 324 in wait
  File "/usr/lib/python3.11/queue.py", line 180 in get
  File "/databricks/python/lib/python3.11/site-packages/tensorboard/summary/writer/event_file_writer.py", line 269 in _run
  File "/databricks/python/lib/python3.11/site-packages/tensorboard/summary/writer/event_file_writer.py", line 244 in run
  File "/usr/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
  File "/usr/lib/python3.11/threading.py", line 995 in _bootstrap

Tried turning off mlflow, installer older tensorboard.. no luck so far so leaving here if anyone else wants to debug.. (Check the cluster driver logs for more info)

julianmukaj commented 3 weeks ago

Update: The Segmentation Fault has been raised internally with Databricks, they have it down as a Feature Request. No ETA yet but hopefully those of us on regulated environments will be able to use Unsloth soon.

danielhanchen commented 2 weeks ago

@julianmukaj Thanks for the update!!