h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.3k stars 1.24k forks source link

no GPUs detected #559

Closed aistartransformer closed 1 year ago

aistartransformer commented 1 year ago

Hello,

I am trying to get llama2 installed on my laptop. I am using MacBook Pro, Apple M2 Max, MacOS Ventura 13.0 (22A8380). I have 32 GB unified memory. "32GB of unified memory makes everything you do fast and fluid" "12-core CPU delivers speeds up to 20 percent faster to fly through pro workflows quicker than ever" "30-core GPU with faster performance for graphics-intensive apps and games"

Memory: 32 GB Type: LPDDR5 Manufacturer: Hynix

But when I ran the following command, python generate.py --base_model='llama' --prompt_type=llama2 the logs show there is no GPUs detected

Auto set langchain_mode=LLM.  Could use MyData instead.  To allow UserData to pull files from disk, set user_path or langchain_mode_paths, and ensure allow_upload_to_user_data=True
No GPUs detected
Using Model llama

I am thinking if GPU cannot be used, I may have to return this laptop.

Or changing this line of code https://github.com/h2oai/h2ogpt/blob/main/src/gen.py#L570 could fix it? Thanks!

pseudotensor commented 1 year ago

Hi @Mathanraj-Sharma , please see if you can help, thanks!

pseudotensor commented 1 year ago

@aistartransformer Hi, If you followed the "getting started" alone that won't be enough to use GPU. You'll need to go to the README_MACOS.md and ensure you compile llama_cpp_python package with Metal support.

I tried to make it easy to see what h2oGPT was about in main readme, while the other readmes go into details for full support of GPU or other things.

aistartransformer commented 1 year ago

@aistartransformer Hi, If you followed the "getting started" alone that won't be enough to use GPU. You'll need to go to the README_MACOS.md and ensure you compile llama_cpp_python package with Metal support.

I tried to make it easy to see what h2oGPT was about in main readme, while the other readmes go into details for full support of GPU or other things.

Thanks. I tried. But I still got the "No GPUs detected". I guess I will return this laptop.

bash-3.2$ python generate.py --base_model='llama' --prompt_type=wizard2 --score_model=None --langchain_mode='UserData' --user_path=user_path
No GPUs detected
Using Model llama

I also got "ValueError: Model path does not exist: WizardLM-7B-uncensored.ggmlv3.q8_0.bin" this time. Btw, the macos readme has CPU only and GPU only commands. I ran both. I wonder why I cannot use both.

pseudotensor commented 1 year ago

You should download the file as in the windows readme: Model File and place it in the h2oGPT folder. This is required for LLaMa.cpp GGML based model files.

aistartransformer commented 1 year ago

You should download the file as in the windows readme: Model File and place it in the h2oGPT folder. This is required for LLaMa.cpp GGML based model files.

Thank you. Now that model path does not exist error is gone. Just No GPUs detected is still there.

I noticed the log has

ggml_metal_init: hasUnifiedMemory             = true
ggml_metal_init: maxTransferRate              = built-in GPU

In addition, I asked a question "what is sentiment analysis breakdown score", then it crashed with the following error.

Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Fatal Python error: Aborted

Is this h2o error or llama2 error?

Mathanraj-Sharma commented 1 year ago

@aistartransformer what did you got for this step

#Verify whether torch uses MPS, run below python script:
 import torch
 if torch.backends.mps.is_available():
     mps_device = torch.device("mps")
     x = torch.ones(1, device=mps_device)
     print (x)
 else:
     print ("MPS device not found.")
slavag commented 1 year ago

Same here, while llama.cpp detects metal, I do see "No GPUI detected"

image
pseudotensor commented 1 year ago

The n_gpus issue is just that the number of GPUs registered there is number of CUDA GPUs, which is not relevant for Metal.

We can modify the code there: https://github.com/h2oai/h2ogpt/blob/940d210637abb909596ea1029ce53ede017e2c8b/src/gen.py#L575-L577

to account for MPS.

Mathanraj-Sharma commented 1 year ago

@pseudotensor I'll create a PR to adjust this

pseudotensor commented 1 year ago

Thanks. In general any torch.cuda or torch.backends.cudnn needs to be avoided for MPS. So can't go into that block still, but maybe the messaging can be improved.

Also, we should scan through the code to be sure no other such instances are triggered when using MPS.

aistartransformer commented 1 year ago
import torch
 if torch.backends.mps.is_available():
     mps_device = torch.device("mps")
     x = torch.ones(1, device=mps_device)
     print (x)
 else:
     print ("MPS device not found.")

The output is tensor([1.], device='mps:0')

pseudotensor commented 1 year ago

I merged a PR by @Mathanraj-Sharma to avoid the confusing message.

3koozy commented 1 year ago

I have a similar issue running this on windows 10 using conda enviroment. here is the logs: nvidia-smi Fri Jul 28 21:44:29 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 536.23 Driver Version: 536.23 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1070 WDDM | 00000000:01:00.0 On | N/A | | 0% 49C P8 12W / 230W | 752MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3532 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 4988 C+G ...soft Office\root\Office16\EXCEL.EXE N/A | | 0 N/A N/A 5124 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 7824 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 7864 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 8176 C+G C:\Windows\System32\WWAHost.exe N/A | | 0 N/A N/A 8188 C+G ...\cef\cef.win7x64\steamwebhelper.exe N/A | | 0 N/A N/A 8780 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 8788 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 9220 C+G ...648wekyb3d8bbwe\CalculatorApp.exe N/A | | 0 N/A N/A 9692 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 10152 C+G ..._8wekyb3d8bbwe\PaintStudio.View.exe N/A | | 0 N/A N/A 10608 C+G ...oogle\Chrome\Application\chrome.exe N/A | | 0 N/A N/A 10948 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 11928 C+G ..._8wekyb3d8bbwe\Microsoft.Photos.exe N/A | | 0 N/A N/A 12328 C+G ...GeForce Experience\NVIDIA Share.exe N/A | | 0 N/A N/A 12668 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 12724 C+G ...GeForce Experience\NVIDIA Share.exe N/A | | 0 N/A N/A 13824 C+G ...inaries\Win64\EpicGamesLauncher.exe N/A | | 0 N/A N/A 14244 C+G ...ne\Binaries\Win64\EpicWebHelper.exe N/A | +---------------------------------------------------------------------------------------+

python generate.py --share=False --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 --score_model=None --load_4bit=True --prompt_type=human_bot Auto set langchain_mode=Disabled Have langchain package: False No GPUs detected Using Model h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 Could not determine --max_seq_len, setting to 2048. Pass if not correct Could not determine --max_seq_len, setting to 2048. Pass if not correct Could not determine --max_seq_len, setting to 2048. Pass if not correct Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [08:00<00:00, 240.42s/it] Model {'base_model': 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'human_bot', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': ': ', 'PreInput': None, 'PreResponse': ':', 'terminate_response': ['\n:', '\n:', ':', ':', ':'], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': ':', 'botstr': ':', 'generates_leading_space': True}} Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().

pseudotensor commented 1 year ago

@3koozy , The original issue was that MAC MPS was used but print out was just wrong since really meant no CUDA GPUs.

However, you have CUDA GPUs. You should check that torch can see your GPUs, since that is all that fails in h2oGPT.

Perhaps you didn't install GPU version of torch as in the readme?

3koozy commented 1 year ago

@pseudotensor I have followed the installation steps , and used the following command to install all requirements + CUDA : pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

but you are right , when I tried fetching gpus using pytorch , for some reason it won't recognize them:

Python 3.10.12 | packaged by Anaconda, Inc. | (main, Jul  5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> available_gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
>>> available_gpus
[]
pseudotensor commented 1 year ago

Yes, that's odd. That

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

should have ensured cuda torch was installed.

3koozy commented 1 year ago

@pseudotensor It seems that I skipped some steps , and after doing the following everything worked :) 1) installed cuda 11.8 version instead of 11.7 with conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

2) installed C++ libraries as in the readme. 3) installed MinGW as in readme. 4) installed : bitsandbytes 4-bit and 8-bit:

pip uninstall bitsandbytes -y
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.1.post1-py3-none-win_amd64.whl

and now it works fine :D

python generate.py --share=False --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 --score_model=None --load_4bit=True --prompt_type=human_bot
Auto set langchain_mode=Disabled  Have langchain package: False
Using Model h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Could not determine --max_seq_len, setting to 2048.  Pass if not correct
Could not determine --max_seq_len, setting to 2048.  Pass if not correct
Could not determine --max_seq_len, setting to 2048.  Pass if not correct
device_map: {'': 0}
bin C:\Users\3koozy\anaconda3\envs\h2ogpt\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.dll
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:22<00:00, 41.18s/it]
Model {'base_model': 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'human_bot', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '<human>: ', 'PreInput': None, 'PreResponse': '<bot>:', 'terminate_response': ['\n<human>:', '\n<bot>:', '<human>:', '<bot>:', '<bot>:'], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': '<human>:', 'botstr': '<bot>:', 'generates_leading_space': True}}
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
pseudotensor commented 1 year ago

Cool!