Cannot compile adapter_model.bin?

kuki2008 commented 2 weeks ago

So, i was trying to run this in google colab:

!python /content/multi_token/scripts/serve_model.py \
    --model_name_or_path mistralai/Mistral-7B-Instruct-v0.1 \
    --model_lora_path sshh12/Mistral-7B-LoRA-ImageBind-LLAVA \
    --port 9069

And then i got this:

Downloading shards: 100% 2/2 [05:11<00:00, 155.64s/it]
Loading checkpoint shards: 100% 2/2 [01:03<00:00, 31.83s/it]
generation_config.json: 100% 116/116 [00:00<00:00, 668kB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
INFO:root:Loading projector weights for ['imagebind']
non_lora_trainables.bin: 100% 168M/168M [00:14<00:00, 11.5MB/s]
INFO:root:Loading pretrained weights: ['imagebind_lmm_projector.mlps.0.0.weight', 'imagebind_lmm_projector.mlps.0.0.bias', 'imagebind_lmm_projector.mlps.0.2.weight', 'imagebind_lmm_projector.mlps.0.2.bias', 'imagebind_lmm_projector.mlps.1.0.weight', 'imagebind_lmm_projector.mlps.1.0.bias', 'imagebind_lmm_projector.mlps.1.2.weight', 'imagebind_lmm_projector.mlps.1.2.bias', 'imagebind_lmm_projector.mlps.2.0.weight', 'imagebind_lmm_projector.mlps.2.0.bias', 'imagebind_lmm_projector.mlps.2.2.weight', 'imagebind_lmm_projector.mlps.2.2.bias', 'imagebind_lmm_projector.mlps.3.0.weight', 'imagebind_lmm_projector.mlps.3.0.bias', 'imagebind_lmm_projector.mlps.3.2.weight', 'imagebind_lmm_projector.mlps.3.2.bias']
INFO:root:Loading and merging LoRA weights from sshh12/Mistral-7B-LoRA-ImageBind-LLAVA
adapter_config.json: 100% 534/534 [00:00<00:00, 2.71MB/s]
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
adapter_model.bin:   3% 10.5M/336M [00:00<00:10, 31.9MB/s]^C

Here is the full log:

2024-06-17 15:33:13.443686: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-17 15:33:13.443752: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-17 15:33:13.599187: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-17 15:33:13.898052: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-17 15:33:17.194957: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
tokenizer_config.json: 100% 1.47k/1.47k [00:00<00:00, 8.37MB/s]
tokenizer.model: 100% 493k/493k [00:00<00:00, 5.81MB/s]
special_tokens_map.json: 100% 72.0/72.0 [00:00<00:00, 424kB/s]
tokenizer.json: 100% 1.80M/1.80M [00:00<00:00, 15.9MB/s]
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
config.json: 100% 741/741 [00:00<00:00, 3.85MB/s]
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/_functional_video.py:6: UserWarning: The 'torchvision.transforms._functional_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms.functional' module instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/transforms/_transforms_video.py:22: UserWarning: The 'torchvision.transforms._transforms_video' module is deprecated since 0.12 and will be removed in the future. Please use the 'torchvision.transforms' module instead.
  warnings.warn(
Downloading imagebind weights to .checkpoints/imagebind_huge.pth ...
100% 4.47G/4.47G [01:15<00:00, 63.3MB/s]
INFO:root:Loading base model from mistralai/Mistral-7B-Instruct-v0.1 as 16 bits
model.safetensors.index.json: 100% 25.1k/25.1k [00:00<00:00, 47.2MB/s]
Downloading shards:   0% 0/2 [00:00<?, ?it/s]
model-00001-of-00002.safetensors:   0% 0.00/9.94G [00:00<?, ?B/s]
model-00001-of-00002.safetensors:   0% 10.5M/9.94G [00:00<01:41, 97.8MB/s]
model-00001-of-00002.safetensors:   0% 21.0M/9.94G [00:00<01:42, 96.5MB/s]
model-00001-of-00002.safetensors:   0% 31.5M/9.94G [00:00<01:55, 85.8MB/s]
model-00001-of-00002.safetensors:   0% 41.9M/9.94G [00:00<01:56, 85.2MB/s]
model-00001-of-00002.safetensors:   1% 52.4M/9.94G [00:00<01:55, 85.8MB/s]
model-00001-of-00002.safetensors:   1% 62.9M/9.94G [00:00<01:54, 86.2MB/s]
...........
model-00001-of-00002.safetensors:  99% 9.89G/9.94G [03:39<00:03, 15.0MB/s]
model-00001-of-00002.safetensors: 100% 9.90G/9.94G [03:40<00:02, 15.6MB/s]
model-00001-of-00002.safetensors: 100% 9.92G/9.94G [03:40<00:01, 22.3MB/s]
model-00001-of-00002.safetensors: 100% 9.94G/9.94G [03:40<00:00, 45.1MB/s]
Downloading shards:  50% 1/2 [03:40<03:40, 220.64s/it]
model-00002-of-00002.safetensors:   0% 0.00/4.54G [00:00<?, ?B/s]
model-00002-of-00002.safetensors:   0% 10.5M/4.54G [00:00<00:44, 103MB/s]
model-00002-of-00002.safetensors:   1% 31.5M/4.54G [00:00<00:42, 106MB/s]
model-00002-of-00002.safetensors:   1% 52.4M/4.54G [00:00<00:41, 108MB/s]
model-00002-of-00002.safetensors:   2% 73.4M/4.54G [00:00<00:38, 117MB/s]
model-00002-of-00002.safetensors:   2% 94.4M/4.54G [00:00<00:37, 120MB/s]
model-00002-of-00002.safetensors:   3% 115M/4.54G [00:01<00:38, 116MB/s] 
...........
model-00002-of-00002.safetensors:  98% 4.46G/4.54G [01:29<00:03, 21.6MB/s]
model-00002-of-00002.safetensors:  99% 4.48G/4.54G [01:29<00:02, 30.5MB/s]
model-00002-of-00002.safetensors:  99% 4.50G/4.54G [01:30<00:01, 40.7MB/s]
model-00002-of-00002.safetensors: 100% 4.52G/4.54G [01:30<00:00, 50.9MB/s]
model-00002-of-00002.safetensors: 100% 4.54G/4.54G [01:30<00:00, 50.2MB/s]
Downloading shards: 100% 2/2 [05:11<00:00, 155.64s/it]
Loading checkpoint shards: 100% 2/2 [01:03<00:00, 31.83s/it]
generation_config.json: 100% 116/116 [00:00<00:00, 668kB/s]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
INFO:root:Loading projector weights for ['imagebind']
non_lora_trainables.bin: 100% 168M/168M [00:14<00:00, 11.5MB/s]
INFO:root:Loading pretrained weights: ['imagebind_lmm_projector.mlps.0.0.weight', 'imagebind_lmm_projector.mlps.0.0.bias', 'imagebind_lmm_projector.mlps.0.2.weight', 'imagebind_lmm_projector.mlps.0.2.bias', 'imagebind_lmm_projector.mlps.1.0.weight', 'imagebind_lmm_projector.mlps.1.0.bias', 'imagebind_lmm_projector.mlps.1.2.weight', 'imagebind_lmm_projector.mlps.1.2.bias', 'imagebind_lmm_projector.mlps.2.0.weight', 'imagebind_lmm_projector.mlps.2.0.bias', 'imagebind_lmm_projector.mlps.2.2.weight', 'imagebind_lmm_projector.mlps.2.2.bias', 'imagebind_lmm_projector.mlps.3.0.weight', 'imagebind_lmm_projector.mlps.3.0.bias', 'imagebind_lmm_projector.mlps.3.2.weight', 'imagebind_lmm_projector.mlps.3.2.bias']
INFO:root:Loading and merging LoRA weights from sshh12/Mistral-7B-LoRA-ImageBind-LLAVA
adapter_config.json: 100% 534/534 [00:00<00:00, 2.71MB/s]
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
adapter_model.bin:   3% 10.5M/336M [00:00<00:10, 31.9MB/s]^C

@sshh12

sshh12 commented 1 week ago

I doesn't look like there's an error (?) Seems to be in the middle of downloading adapter_model.bin when it's interrupted

kuki2008 commented 1 week ago

any ideas why is it interrupting?

sshh12 commented 3 days ago

Hm potentially the notebook is timing out, not sure but might be on colab side.

kuki2008 commented 3 days ago

okay, maybe i should try running it on my local machine, but i dont have Nvidia GPU, so i have a question: Will it run without CUDA?

sshh12 commented 3 days ago

Hm worth a shot, nothing in library is CUDA specific but totally possible pytorch issues pop up due to not having it.

kuki2008 commented 2 days ago

Traceback (most recent call last):
  File "c:\Users\Kuki\Documents\VS-Projects\python\newra_7.1\multi_token\scripts\serve_model.py", line 32, in <module>
    model, tokenizer = load_trained_lora_model(
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\Kuki\Documents\VS-Projects\python\newra_7.1\multi_token\scripts\multi_token\inference.py", line 52, in load_trained_lora_model
    model = model_cls.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Kuki\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 3030, in from_pretrained
    raise RuntimeError("No GPU found. A GPU is needed for quantization.")
RuntimeError: No GPU found. A GPU is needed for quantization.

Looks like i need to have GPU

sshh12 / multi_token

Cannot compile adapter_model.bin? #22