Out of memory error while finetuning unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit

abpani commented 1 month ago

Trying to finetune unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit. It gives me error with batch size 1 and max seq length of 2048. I can see the example notebook on colab t4.

Unsloth: Fast Mistral patching release 2024.7 \ /| GPU: NVIDIA A10G. Max memory: 22.191 GB. Platform = Linux. O^O/ _/ \ Pytorch: 2.3.0+cu121. CUDA = 8.6. CUDA Toolkit = 12.1. \ / Bfloat16 = TRUE. FA [Xformers = 0.0.26.post1. FA2 = True] "-____-" Free Apache license: http://github.com/unslothai/unsloth Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.12s/it] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 314.00 MiB. GPU

danielhanchen commented 1 month ago

Just tried your settings in Colab with a max seq length of 2048 (padded to 2048 to mimic max length) on bsz = 1 - it should fit in 12GB of VRAM

Try reinstalling Unsloth on your A10G:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

abpani commented 1 month ago

Now I am getting this error after reinstalling.

RuntimeError: ptxas failed with error code 127: sh: 1: D/.myvenv/lib/python3.10/site-packages/triton/common/../third_party/cuda/bin/ptxas: not found

danielhanchen commented 1 month ago

Oh no that means somewhere you broke your CUDA installation - can you try nvcc does that work in the terminal?

unslothai / unsloth

Out of memory error while finetuning unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit #786