Closed SyedA5688 closed 2 weeks ago
@SyedA5688 Wait we don't yet have a Torch 2.5 tag yet - probably best to just use unsloth
without the tag
I see, thanks for your quick reply! I've gotten farther along with trying to build a Docker container with torch 2.5 and unsloth, however I still run into the above error. Is there a Docker image available for unsloth by any chance that I could base my image off of?
@SyedA5688 One way is to use the vLLM Docker image, and add Unsloth together (although vLLM doesn't yet support torch 2.5)
Also I just added unsloth[cu121-torch250]
as well!
Thank you for updating to torch 2.5!
After some more experimenting with Docker files, I have landed on this Docker file based on some other posts:
ARG CUDA_VERSION="12.2.2"
ARG UBUNTU_VERSION="22.04"
ARG DOCKER_FROM=nvidia/cuda:$CUDA_VERSION-devel-ubuntu$UBUNTU_VERSION
FROM $DOCKER_FROM AS base
# Below command creates home dir for 1000 UID user if it is not present.
RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi
RUN mkdir /workdir
WORKDIR /workdir
ENV LANG=C.UTF-8
RUN apt-get update --allow-releaseinfo-change && apt-get install -y git python3 python3-pip netcat
RUN apt-get update -y && \
apt-get install -y python3 python3-pip && \
apt-get install -y --no-install-recommends git && \
python3 -m pip install --upgrade pip && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN python3 -m pip install "trl<=0.9.0" peft==0.10.0 bitsandbytes==0.43.3 transformers[sentencepiece]==4.43.4
RUN python3 -m pip install torch==2.2.1+cu121 torchvision --index-url https://download.pytorch.org/whl/cu121
RUN python3 -m pip install "unsloth @ git+https://github.com/unslothai/unsloth.git@d0ca3497eb5911483339be025e9924cf73280178"
RUN python3 -m pip install --no-deps "xformers<0.0.26" --force-reinstall
RUN python3 -m pip install flash_attn==2.6.3
RUN python3 -m pip install absl-py
RUN python3 -m pip install numpy==1.26.4
COPY train_C2S_unsloth_HF_torch_lora.py /workdir/pytorch_xcloud_training/train_C2S_unsloth_HF_torch_lora.py
# Below command make 1000 UID and root user as owners of the workdir.
RUN chown -R 1000:root /workdir && chmod -R 775 /workdir
ENTRYPOINT ["python3", "-m", "pytorch_xcloud_training.train_C2S_unsloth_HF_torch_lora"]
When I run with this Docker file, all libraries seem set up correctly (no errors with CUDA or flash attention), however I still run into the same error as I mentioned above:
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
* (tuple of ints size, *, tuple of names names, torch.memory_format memory_format = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
* (tuple of ints size, *, torch.memory_format memory_format = None, Tensor out = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
This error occurs in the call to FastLanguageModel.get_peft_model(). Full stack trace:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workdir/pytorch_xcloud_training/train_C2S_unsloth_HF_torch_lora.py", line 189, in <module>
app.run(main)
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/usr/local/lib/python3.10/dist-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/workdir/pytorch_xcloud_training/train_C2S_unsloth_HF_torch_lora.py", line 97, in main
model = FastLanguageModel.get_peft_model(
File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 2125, in get_peft_model
model = _get_peft_model(model, lora_config)
File "/usr/local/lib/python3.10/dist-packages/peft/mapping.py", line 136, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 1094, in __init__
super().__init__(model, peft_config, adapter_name)
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 129, in __init__
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 136, in __init__
super().__init__(model, config, adapter_name)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 148, in __init__
self.inject_adapter(self.model, adapter_name)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter
self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace
new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/model.py", line 295, in _create_new_module
new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/bnb.py", line 506, in dispatch_bnb_4bit
new_module = Linear4bit(target, adapter_name, **fourbit_kwargs)
File "/usr/local/lib/python3.10/dist-packages/peft/tuners/lora/bnb.py", line 293, in __init__
self.update_layer(
File "<string>", line 17, in LoraLayer_update_layer
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 98, in __init__
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
* (tuple of ints size, *, tuple of names names, torch.memory_format memory_format = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
* (tuple of ints size, *, torch.memory_format memory_format = None, Tensor out = None, torch.dtype dtype = None, torch.layout layout = None, torch.device device = None, bool pin_memory = False, bool requires_grad = False)
This error does not come up when running Unsloth Gemma-2 LoRA training in Google colab (torch 2.5.0+cu121 and unsloth 2024.10.7) or Anaconda environments I tried on other Linux machines (which had pytorch 2.4.1 and unsloth 2024.10.0). Any idea what might be causing this particular error?
Update: after some more debugging, I found out the issue: I was passing in lora_rank (r) = 16.0, which is a float, rather than an integer; the error made it hard to spot that I had my types incorrect for the argument. When I passed in 16, the error was resolved.
Closing this issue, thank you for your responsiveness!
Hi there, thank you for the great work on Unsloth! I am trying to install the package in a Docker containers to fine-tune Gemma models following the Colab notebooks for Gemma: https://colab.research.google.com/drive/1vIrqH5uYDQwsJ4-OO3DErvuv4pBgVwk4?usp=sharing
I am working with a Docker file (below) where I am trying to emulate the package versions in the Colab runtime, which is torch==2.5 and unsloth==2024.10.7.
I have been running into errors with my pip installation for a while now in the call to
model = FastLanguageModel.get_peft_model()
:I am unsure of what is causing this error. I am able to create a working Anaconda environment following the Anaconda installation instructions in the ReadME, and the Colab runs, however any Docker environment I create runs into this error. Any advice on this would be greatly appreciated!