unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.18k stars 1.27k forks source link

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32 #512

Open mei-chen opened 5 months ago

mei-chen commented 5 months ago

Getting this error after conda installation

Full output

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.3.0)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA GeForce GTX 1080. Max memory: 7.915 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0. CUDA = 6.1. CUDA Toolkit = 11.8.
\        /    Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 210,289 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
  0%|                                                                                                                                                                                                                                             | 0/60 [00:00<?, ?it/s]
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
Aborted
Erland366 commented 5 months ago

Issue seems like related to this (https://github.com/state-spaces/mamba/issues/173) where it stated that your GPU is not supported on Triton, which is library that's powering Unsloth.

lousaibiao commented 5 months ago

Any ideas about fixing this issue. Is it possible that disable Triton?

lousaibiao commented 5 months ago

I found this link, dose this helps?

danielhanchen commented 5 months ago

@lousaibiao No sorry Triton must be used, or there's no 2x faster :(

lousaibiao commented 5 months ago

@danielhanchen I have enough time,😂. Could you tell me the way to disable it?

danielhanchen commented 5 months ago

@lousaibiao Wait you simply just want a slow version? Oh

lousaibiao commented 5 months ago

@danielhanchen First of all, I want it works. 😂

danielhanchen commented 5 months ago

Oh ok loll well itll take some time - we might take Triton out and make it an optional dependecy

mathysferrato commented 5 months ago

Hi I have the same issue on my GTX 1070ti, I know it's linked to compute capability but why is it the case ? Is it going to be comptaible with older GPUs like mine ?

FD1970 commented 5 months ago

Also interested in a solution. I am using 2 Tesla P4 GPUs.

mathysferrato commented 5 months ago

Getting this error after conda installation

Full output

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.3.0+cu121 with CUDA 1201 (you have 2.3.0)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA GeForce GTX 1080. Max memory: 7.915 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0. CUDA = 6.1. CUDA Toolkit = 11.8.
\        /    Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 210,289 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
  0%|                                                                                                                                                                                                                                             | 0/60 [00:00<?, ?it/s]
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
Aborted

Hi did you find a way to pass this error ?

danielhanchen commented 5 months ago

Try installing an older version of Unsloth and xformers

FD1970 commented 5 months ago

I really appreciate you taking the time to answer!!!!! I am a novice experimenting with a home lab, interested in learning everything about AI & coding in general. How do I get an older Ver. ?RegardsFelix DuranCell: (917) @. NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message. On Jun 15, 2024, at 6:25 AM, Daniel Han @.> wrote: Try installing an older version of Unsloth and xformers

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

FD1970 commented 5 months ago

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. ==((====))== Unsloth: Fast Llama patching release 2024.6 \ /| GPU: Tesla P4. Max memory: 7.421 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.0+cu121. CUDA = 6.1. CUDA Toolkit = 12.1. \ / Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False. "--" Free Apache license: http://github.com/unslothai/unsloth Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Unsloth 2024.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers. max_steps is given, it will override any value given in num_trainepochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 210,289 | Num Epochs = 1 O^O/ \/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 60 "--" Number of trainable parameters = 41,943,040 0%| | 0/60 [00:00<?, ?it/s]LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32

Any Ideas?

davidngrc commented 5 months ago

I have the same problem with Tesla P40 @danielhanchen please allow Triton as an optional dependency, many old card user, we have time, we can wait more training time, please make it work, thank you.

Unsloth: Will patch your computer to enable 2x faster free finetuning. ==((====))== Unsloth: Fast Llama patching release 2024.6 \\ /| GPU: Tesla P40. Max memory: 23.866 GB. Platform = Linux. O^O/ \_/ \ Pytorch: 2.3.0+cu121. CUDA = 6.1. CUDA Toolkit = 12.1. \ / Bfloat16 = FALSE. Xformers = 0.0.26.post1. FA = False. "-____-" Free Apache license: http://github.com/unslothai/unsloth Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Unsloth 2024.6 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers. max_steps is given, it will override any value given in num_train_epochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \\ /| Num examples = 44 | Num Epochs = 12 O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 60 "-____-" Number of trainable parameters = 41,943,040 0%| | 0/60 [00:00<?, ?it/s]LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32 Aborted

llvm-config --version 14.0.0

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Feb__7_19:32:13_PST_2023 Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0

nvidia-smi `+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.78 Driver Version: 550.78 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Tesla P40 Off | 00000000:04:00.0 Off | Off | | N/A 39C P0 49W / 250W | 4943MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+`

danielhanchen commented 5 months ago

Sadly P4s and P40s @davidngrc @FD1970 will not be supported - older versions maybe - they don't have tensor cores, so it'll be at least 5x slower than a free Colab Tesla T4.

FD1970 commented 5 months ago

Good morning, I'm at Version Release 2024.6, How can I get an older Version?

davidngrc commented 5 months ago

@danielhanchen we don't mind slow, we can wait, please add option for the old card. thank you. I found this article, and it work. https://blog.csdn.net/weixin_44388614/article/details/139026580 My GPU is Tesla P40, which CUDA is 6.1

I am running it on Proxmox 8.2 LXC in ubuntu 22.04 (flash install). host and LXC has NVIDIA-Linux-x86_64-550.78.run below are the command I run in the LXC

wget https://download.nvidia.com/XFree86/Linux-x86_64/550.78/NVIDIA-Linux-x86_64-550.78.run
chmod +x ./NVIDIA-Linux-x86_64-550.78.run
./NVIDIA-Linux-x86_64-550.78.run  --no-kernel-module
nvidia-smi
#NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4
poweroff

apt update
apt install -y build-essential
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
chmod +x ./cuda_11.8.0_520.61.05_linux.run
./cuda_11.8.0_520.61.05_linux.run

nano ~/.bashrc
#to the end
export LD_LIBRARY_PATH=/usr/local/cuda/lib64
export PATH=$PATH:/usr/local/cuda/bin

source ~/.bashrc
nvcc --version
#Cuda compilation tools, release 11.8, V11.8.89
poweroff

#install conda from https://docs.anaconda.com/miniconda/
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh
reboot

apt install -y git
conda create -y --name unsloth_env python=3.10
conda activate unsloth_env

conda install -y pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -y xformers -c xformers
conda install -y cudatoolkit
conda install -y transformers
pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
pip install --no-deps trl peft accelerate bitsandbytes
poweroff

apt install -y python-is-python3
conda activate unsloth_env

python -m xformers.info
python -m bitsandbytes

here are the software version

miniconda3
python-3.10.14
pytorch-2.1.0
torchvision-0.16.0-py310_cu118 
torchaudio-2.1.0-py310_cu118 
pytorch-cuda-11.8-h7e8668a_5
xformers-0.0.22.post7-py310_cu11.8.0_pyt2.1.0 
cudatoolkit-11.8.0-h6a678d5_0 
transformers-4.41.2
trl-0.9.4-py3-none-any.whl
peft-0.11.1-py3-none-any.whl
accelerate-0.31.0-py3-none-any.whl
bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl
emuchogu commented 3 months ago

I wanted to share a working solution for a Docker setup that's compatible with NVIDIA Pascal cards. I've tested this configuration on an NVIDIA P40 GPU, and it works well. Thx to comment by @davidngrc.

Below you'll find the contents of the necessary files to set up this environment. You can copy these and create the files locally to use this setup.

1. Dockerfile

# Stage 1: Base image with system dependencies
FROM nvidia/cuda:11.8.0-devel-ubuntu22.04 as base

# Install system dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    git \
    vim \
    curl \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Install Miniconda only if it's not already installed
RUN if [ ! -d "/opt/conda" ]; then \
        wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh && \
        bash miniconda.sh -b -p /opt/conda && \
        rm miniconda.sh; \
    fi

# Set path to conda
ENV PATH /opt/conda/bin:$PATH

# Set path to conda
ENV PATH /opt/conda/bin:$PATH

# Stage 2: Python environment setup
FROM base as python-env

COPY unsloth_env_file.yml unsloth_env_file.yml

RUN conda env create -f unsloth_env_file.yml

SHELL ["conda", "run", "-n", "unsloth_env", "/bin/bash", "-c"]

# Stage 3: Final image
FROM python-env as final

# Install Unsloth (This step is separate because it's likely to change more frequently)
RUN pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

ENV PATH /usr/local/cuda/bin:$PATH
ENV LD_LIBRARY_PATH /usr/local/cuda/lib64:$LD_LIBRARY_PATH

# Set the working directory
WORKDIR /workspace

# Set the default command to run Jupyter Lab
CMD ["conda", "run", "--no-capture-output", "-n", "unsloth_env", "jupyter", "lab", "--ip=0.0.0.0", "--no-browser", "--allow-root", "--NotebookApp.token=''", "--NotebookApp.password=''"]

2. unsloth_env_file.yml

name: unsloth_env
channels:
  - xformers
  - pytorch
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=2_gnu
  - aiohttp=3.9.5=py310h5eee18b_0
  - aiosignal=1.2.0=pyhd3eb1b0_0
  - anyio=4.2.0=py310h06a4308_0
  - argon2-cffi=21.3.0=pyhd3eb1b0_0
  - argon2-cffi-bindings=21.2.0=py310h7f8727e_0
  - arrow-cpp=16.1.0=hc1eb8f0_0
  - async-lru=2.0.4=pyhd8ed1ab_0
  - async-timeout=4.0.3=py310h06a4308_0
  - attrs=23.1.0=py310h06a4308_0
  - aws-c-auth=0.6.19=h5eee18b_0
  - aws-c-cal=0.5.20=hdbd6064_0
  - aws-c-common=0.8.5=h5eee18b_0
  - aws-c-compression=0.2.16=h5eee18b_0
  - aws-c-event-stream=0.2.15=h6a678d5_0
  - aws-c-http=0.6.25=h5eee18b_0
  - aws-c-io=0.13.10=h5eee18b_0
  - aws-c-mqtt=0.7.13=h5eee18b_0
  - aws-c-s3=0.1.51=hdbd6064_0
  - aws-c-sdkutils=0.1.6=h5eee18b_0
  - aws-checksums=0.1.13=h5eee18b_0
  - aws-crt-cpp=0.18.16=h6a678d5_0
  - aws-sdk-cpp=1.10.55=h721c034_0
  - babel=2.14.0=pyhd8ed1ab_0
  - beautifulsoup4=4.12.3=py310h06a4308_0
  - blas=1.0=mkl
  - bleach=4.1.0=pyhd3eb1b0_0
  - boost-cpp=1.82.0=hdb19cb5_2
  - bottleneck=1.3.7=py310ha9d4c09_0
  - brotli-python=1.0.9=py310h6a678d5_8
  - bzip2=1.0.8=h5eee18b_6
  - c-ares=1.19.1=h5eee18b_0
  - ca-certificates=2024.7.4=hbcca054_0
  - certifi=2024.7.4=pyhd8ed1ab_0
  - cffi=1.16.0=py310h5eee18b_1
  - charset-normalizer=3.3.2=pyhd3eb1b0_0
  - cuda-cudart=11.8.89=0
  - cuda-cupti=11.8.87=0
  - cuda-libraries=11.8.0=0
  - cuda-nvrtc=11.8.89=0
  - cuda-nvtx=11.8.86=0
  - cuda-runtime=11.8.0=0
  - cuda-version=11.8=hcce14f8_3
  - cudatoolkit=11.8.0=h6a678d5_0
  - datasets=2.19.1=py310h06a4308_0
  - debugpy=1.6.7=py310h6a678d5_0
  - decorator=5.1.1=pyhd3eb1b0_0
  - defusedxml=0.7.1=pyhd3eb1b0_0
  - dill=0.3.8=py310h06a4308_0
  - entrypoints=0.4=py310h06a4308_0
  - ffmpeg=4.3=hf484d3e_0
  - filelock=3.13.1=py310h06a4308_0
  - freetype=2.12.1=h4a9f257_0
  - frozenlist=1.4.0=py310h5eee18b_0
  - fsspec=2024.3.1=py310h06a4308_0
  - gflags=2.2.2=h6a678d5_1
  - glog=0.5.0=h6a678d5_1
  - gmp=6.2.1=h295c915_3
  - gmpy2=2.1.2=py310heeb90bb_0
  - gnutls=3.6.15=he1e5248_0
  - h11=0.14.0=pyhd8ed1ab_0
  - h2=4.1.0=pyhd8ed1ab_0
  - hpack=4.0.0=pyh9f0ad1d_0
  - httpcore=1.0.5=pyhd8ed1ab_0
  - httpx=0.27.0=pyhd8ed1ab_0
  - hyperframe=6.0.1=pyhd8ed1ab_0
  - icu=73.1=h6a678d5_0
  - idna=3.7=py310h06a4308_0
  - importlib-metadata=7.0.1=py310h06a4308_0
  - importlib_metadata=7.0.1=hd8ed1ab_0
  - importlib_resources=6.4.0=pyhd8ed1ab_0
  - intel-openmp=2023.1.0=hdb19cb5_46306
  - ipykernel=6.28.0=py310h06a4308_0
  - ipython_genutils=0.2.0=pyhd3eb1b0_1
  - jedi=0.19.1=py310h06a4308_0
  - jinja2=3.1.4=py310h06a4308_0
  - jpeg=9e=h5eee18b_2
  - json5=0.9.25=pyhd8ed1ab_0
  - jsonschema=4.19.2=py310h06a4308_0
  - jsonschema-specifications=2023.7.1=py310h06a4308_0
  - jupyter-lsp=2.2.5=pyhd8ed1ab_0
  - jupyter_client=7.4.9=py310h06a4308_0
  - jupyter_core=5.7.2=py310h06a4308_0
  - jupyter_events=0.10.0=py310h06a4308_0
  - jupyter_server=2.14.1=py310h06a4308_0
  - jupyter_server_terminals=0.4.4=py310h06a4308_1
  - jupyterlab=4.2.4=pyhd8ed1ab_0
  - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1
  - jupyterlab_server=2.27.3=pyhd8ed1ab_0
  - krb5=1.20.1=h143b758_1
  - lame=3.100=h7b6447c_0
  - lcms2=2.12=h3be6417_0
  - ld_impl_linux-64=2.38=h1181459_1
  - lerc=3.0=h295c915_0
  - libabseil=20240116.2=cxx17_h6a678d5_0
  - libboost=1.82.0=h109eef0_2
  - libbrotlicommon=1.0.9=h5eee18b_8
  - libbrotlidec=1.0.9=h5eee18b_8
  - libbrotlienc=1.0.9=h5eee18b_8
  - libcublas=11.11.3.6=0
  - libcufft=10.9.0.58=0
  - libcufile=1.9.1.3=0
  - libcurand=10.3.5.147=0
  - libcurl=8.7.1=h251f7ec_0
  - libcusolver=11.4.1.48=0
  - libcusparse=11.7.5.86=0
  - libdeflate=1.17=h5eee18b_1
  - libedit=3.1.20230828=h5eee18b_0
  - libev=4.33=h7f8727e_1
  - libevent=2.1.12=hdbd6064_1
  - libffi=3.4.4=h6a678d5_1
  - libgcc-ng=14.1.0=h77fa898_0
  - libgomp=14.1.0=h77fa898_0
  - libgrpc=1.62.2=h2d74bed_0
  - libiconv=1.16=h5eee18b_3
  - libidn2=2.3.4=h5eee18b_0
  - libjpeg-turbo=2.0.0=h9bf148f_0
  - libnghttp2=1.57.0=h2d74bed_0
  - libnpp=11.8.0.86=0
  - libnvjpeg=11.9.0.86=0
  - libpng=1.6.39=h5eee18b_0
  - libprotobuf=4.25.3=he621ea3_0
  - libsodium=1.0.18=h7b6447c_0
  - libssh2=1.11.0=h251f7ec_0
  - libstdcxx-ng=11.2.0=h1234567_1
  - libtasn1=4.19.0=h5eee18b_0
  - libthrift=0.15.0=h1795dd8_2
  - libtiff=4.5.1=h6a678d5_0
  - libunistring=0.9.10=h27cfd23_0
  - libuuid=1.41.5=h5eee18b_0
  - libwebp-base=1.3.2=h5eee18b_0
  - llvm-openmp=14.0.6=h9e868ea_0
  - lz4-c=1.9.4=h6a678d5_1
  - markupsafe=2.1.3=py310h5eee18b_0
  - mistune=2.0.4=py310h06a4308_0
  - mkl=2023.1.0=h213fc3f_46344
  - mkl-service=2.4.0=py310h5eee18b_1
  - mkl_fft=1.3.8=py310h5eee18b_0
  - mkl_random=1.2.4=py310hdb19cb5_0
  - mpc=1.1.0=h10f8cd9_1
  - mpfr=4.0.2=hb69a4c5_1
  - mpmath=1.3.0=py310h06a4308_0
  - multidict=6.0.4=py310h5eee18b_0
  - multiprocess=0.70.15=py310h06a4308_0
  - nb_conda_kernels=2.3.1=py310h06a4308_0
  - nbclassic=1.1.0=py310h06a4308_0
  - nbclient=0.8.0=py310h06a4308_0
  - nbconvert=7.10.0=py310h06a4308_0
  - nbformat=5.9.2=py310h06a4308_0
  - ncurses=6.4=h6a678d5_0
  - nest-asyncio=1.6.0=py310h06a4308_0
  - nettle=3.7.3=hbbd107a_1
  - networkx=3.3=py310h06a4308_0
  - notebook=6.5.7=py310h06a4308_0
  - notebook-shim=0.2.3=py310h06a4308_0
  - numexpr=2.8.7=py310h85018f9_0
  - numpy=1.26.4=py310h5f9d8c6_0
  - numpy-base=1.26.4=py310hb5e798b_0
  - openh264=2.1.1=h4ff587b_0
  - openjpeg=2.4.0=h9ca470c_2
  - openssl=3.3.1=h4bc722e_2
  - orc=2.0.1=h2d29ad5_0
  - overrides=7.4.0=py310h06a4308_0
  - packaging=24.1=py310h06a4308_0
  - pandas=2.2.2=py310h6a678d5_0
  - pandocfilters=1.5.0=pyhd3eb1b0_0
  - pillow=10.4.0=py310h5eee18b_0
  - pip=24.0=py310h06a4308_0
  - platformdirs=3.10.0=py310h06a4308_0
  - prometheus_client=0.14.1=py310h06a4308_0
  - prompt_toolkit=3.0.43=hd3eb1b0_0
  - psutil=5.9.0=py310h5eee18b_0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - pure_eval=0.2.2=pyhd3eb1b0_0
  - pyarrow=16.1.0=py310h1128e8f_0
  - pycparser=2.21=pyhd3eb1b0_0
  - pysocks=1.7.1=py310h06a4308_0
  - python=3.10.14=h955ad1f_1
  - python-dateutil=2.9.0post0=py310h06a4308_2
  - python-fastjsonschema=2.16.2=py310h06a4308_0
  - python-json-logger=2.0.7=py310h06a4308_0
  - python-tzdata=2023.3=pyhd3eb1b0_0
  - python-xxhash=2.0.2=py310h5eee18b_1
  - pytorch=2.1.0=py3.10_cuda11.8_cudnn8.7.0_0
  - pytorch-cuda=11.8=h7e8668a_5
  - pytorch-mutex=1.0=cuda
  - pytz=2024.1=py310h06a4308_0
  - pyyaml=6.0.1=py310h5eee18b_0
  - pyzmq=24.0.1=py310h5eee18b_0
  - re2=2022.04.01=h295c915_0
  - readline=8.2=h5eee18b_0
  - referencing=0.30.2=py310h06a4308_0
  - regex=2023.10.3=py310h5eee18b_0
  - requests=2.32.3=py310h06a4308_0
  - rfc3339-validator=0.1.4=py310h06a4308_0
  - rfc3986-validator=0.1.1=py310h06a4308_0
  - rpds-py=0.10.6=py310hb02cf49_0
  - s2n=1.3.27=hdbd6064_0
  - safetensors=0.4.2=py310ha89cbab_1
  - send2trash=1.8.2=py310h06a4308_0
  - setuptools=69.5.1=py310h06a4308_0
  - six=1.16.0=pyhd3eb1b0_1
  - snappy=1.1.10=h6a678d5_1
  - sniffio=1.3.0=py310h06a4308_0
  - soupsieve=2.5=py310h06a4308_0
  - sqlite=3.45.3=h5eee18b_0
  - stack_data=0.2.0=pyhd3eb1b0_0
  - sympy=1.12=py310h06a4308_0
  - tbb=2021.8.0=hdb19cb5_0
  - terminado=0.17.1=py310h06a4308_0
  - tinycss2=1.2.1=py310h06a4308_0
  - tk=8.6.14=h39e8969_0
  - tokenizers=0.19.1=py310hff361bb_0
  - tomli=2.0.1=pyhd8ed1ab_0
  - torchaudio=2.1.0=py310_cu118
  - torchtriton=2.1.0=py310
  - torchvision=0.16.0=py310_cu118
  - tornado=6.4.1=py310h5eee18b_0
  - tqdm=4.66.4=py310h2f386ee_0
  - traitlets=5.14.3=py310h06a4308_0
  - typing-extensions=4.11.0=py310h06a4308_0
  - typing_extensions=4.11.0=py310h06a4308_0
  - tzdata=2024a=h04d1e81_0
  - urllib3=2.2.2=py310h06a4308_0
  - utf8proc=2.6.1=h5eee18b_1
  - webencodings=0.5.1=py310h06a4308_1
  - websocket-client=1.8.0=py310h06a4308_0
  - wheel=0.43.0=py310h06a4308_0
  - xformers=0.0.22.post7=py310_cu11.8.0_pyt2.1.0
  - xxhash=0.8.0=h7f8727e_3
  - xz=5.4.6=h5eee18b_1
  - yaml=0.2.5=h7b6447c_0
  - yarl=1.9.3=py310h5eee18b_0
  - zeromq=4.3.5=h6a678d5_0
  - zipp=3.17.0=py310h06a4308_0
  - zlib=1.2.13=h5eee18b_1
  - zstd=1.5.5=hc292b87_2
  - pip:
      - accelerate==0.33.0
      - asttokens==2.4.1
      - bitsandbytes==0.43.2
      - comm==0.2.2
      - docstring-parser==0.16
      - exceptiongroup==1.2.2
      - executing==2.0.1
      - gguf==0.9.1
      - hf-transfer==0.1.8
      - huggingface-hub==0.24.2
      - iprogress==0.4
      - ipython==8.26.0
      - ipywidgets==8.1.3
      - jupyterlab-widgets==3.0.11
      - markdown-it-py==3.0.0
      - matplotlib-inline==0.1.7
      - mdurl==0.1.2
      - parso==0.8.4
      - peft==0.12.0
      - pexpect==4.9.0
      - prompt-toolkit==3.0.47
      - protobuf==3.20.3
      - pure-eval==0.2.3
      - pygments==2.18.0
      - rich==13.7.1
      - sentencepiece==0.2.0
      - shtab==1.7.1
      - stack-data==0.6.3
      - transformers==4.43.3
      - trl==0.8.6
      - tyro==0.8.5
      - wcwidth==0.2.13
      - widgetsnbextension==4.0.11

3. docker-compose.yml

version: '3.8'

services:
  unsloth-env:
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
    volumes:
      - ./cache:/root/.cache
      - ./workspace:/workspace
    working_dir: /workspace
    ports:
      - "8888:8888"  # For Jupyter Lab
    tty: true
    stdin_open: true
    build:
      context: .
      dockerfile: Dockerfile

Setup Instructions

To use this setup:

  1. Create three files (Dockerfile, unsloth_env_file.yml, and docker-compose.yml) with the contents provided above.

  2. Ensure you have Docker and Docker Compose installed on your system.

  3. Install the NVIDIA Container Toolkit for GPU support if you haven't already.

  4. Place all three files in the same directory.

  5. Open a terminal and navigate to the directory containing these files.

  6. Run the following command to build and start the container:

    docker-compose up --build
  7. Once the container is running, access Jupyter Lab by opening a web browser and navigating to http://localhost:8888.

Key Features

danielhanchen commented 3 months ago

@emuchogu Fantastic!! Would you be interested in adding a PR to say the Readme somewhere (just tack it onto the bottom) - I will then move it into the wiki!! (so you can get a contribution :))

emuchogu commented 3 months ago

@emuchogu Fantastic!! Would you be interested in adding a PR to say the Readme somewhere (just tack it onto the bottom) - I will then move it into the wiki!! (so you can get a contribution :))

I've opened the PR: https://github.com/unslothai/unsloth/pull/870