Closed aistartransformer closed 1 year ago
Hi @Mathanraj-Sharma , please see if you can help, thanks!
@aistartransformer Hi, If you followed the "getting started" alone that won't be enough to use GPU. You'll need to go to the README_MACOS.md and ensure you compile llama_cpp_python package with Metal support.
I tried to make it easy to see what h2oGPT was about in main readme, while the other readmes go into details for full support of GPU or other things.
@aistartransformer Hi, If you followed the "getting started" alone that won't be enough to use GPU. You'll need to go to the README_MACOS.md and ensure you compile llama_cpp_python package with Metal support.
I tried to make it easy to see what h2oGPT was about in main readme, while the other readmes go into details for full support of GPU or other things.
Thanks. I tried. But I still got the "No GPUs detected". I guess I will return this laptop.
bash-3.2$ python generate.py --base_model='llama' --prompt_type=wizard2 --score_model=None --langchain_mode='UserData' --user_path=user_path
No GPUs detected
Using Model llama
I also got "ValueError: Model path does not exist: WizardLM-7B-uncensored.ggmlv3.q8_0.bin
" this time. Btw, the macos readme has CPU only and GPU only commands. I ran both. I wonder why I cannot use both.
You should download the file as in the windows readme: Model File and place it in the h2oGPT folder. This is required for LLaMa.cpp GGML based model files.
You should download the file as in the windows readme: Model File and place it in the h2oGPT folder. This is required for LLaMa.cpp GGML based model files.
Thank you. Now that model path does not exist error is gone. Just No GPUs detected
is still there.
I noticed the log has
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: maxTransferRate = built-in GPU
In addition, I asked a question "what is sentiment analysis breakdown score", then it crashed with the following error.
Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Asserting on type 8
GGML_ASSERT: /private/var/folders/21/c3pt5qbd675_7tlchydbf93h0000gr/T/pip-install-rcpbk4s6/llama-cpp-python_066800590d674d239e7588d8f45d2593/vendor/llama.cpp/ggml-metal.m:738: false && "not implemented"
Fatal Python error: Aborted
Is this h2o error or llama2 error?
@aistartransformer what did you got for this step
#Verify whether torch uses MPS, run below python script:
import torch
if torch.backends.mps.is_available():
mps_device = torch.device("mps")
x = torch.ones(1, device=mps_device)
print (x)
else:
print ("MPS device not found.")
Same here, while llama.cpp detects metal, I do see "No GPUI detected"
The n_gpus issue is just that the number of GPUs registered there is number of CUDA GPUs, which is not relevant for Metal.
We can modify the code there: https://github.com/h2oai/h2ogpt/blob/940d210637abb909596ea1029ce53ede017e2c8b/src/gen.py#L575-L577
to account for MPS.
@pseudotensor I'll create a PR to adjust this
Thanks. In general any torch.cuda
or torch.backends.cudnn
needs to be avoided for MPS. So can't go into that block still, but maybe the messaging can be improved.
Also, we should scan through the code to be sure no other such instances are triggered when using MPS.
import torch if torch.backends.mps.is_available(): mps_device = torch.device("mps") x = torch.ones(1, device=mps_device) print (x) else: print ("MPS device not found.")
The output is tensor([1.], device='mps:0')
I merged a PR by @Mathanraj-Sharma to avoid the confusing message.
I have a similar issue running this on windows 10 using conda enviroment. here is the logs: nvidia-smi Fri Jul 28 21:44:29 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 536.23 Driver Version: 536.23 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce GTX 1070 WDDM | 00000000:01:00.0 On | N/A | | 0% 49C P8 12W / 230W | 752MiB / 8192MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 3532 C+G ...1.0_x648wekyb3d8bbwe\Video.UI.exe N/A | | 0 N/A N/A 4988 C+G ...soft Office\root\Office16\EXCEL.EXE N/A | | 0 N/A N/A 5124 C+G C:\Windows\explorer.exe N/A | | 0 N/A N/A 7824 C+G ...5n1h2txyewy\ShellExperienceHost.exe N/A | | 0 N/A N/A 7864 C+G ...t.LockApp_cw5n1h2txyewy\LockApp.exe N/A | | 0 N/A N/A 8176 C+G C:\Windows\System32\WWAHost.exe N/A | | 0 N/A N/A 8188 C+G ...\cef\cef.win7x64\steamwebhelper.exe N/A | | 0 N/A N/A 8780 C+G ...2txyewy\StartMenuExperienceHost.exe N/A | | 0 N/A N/A 8788 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 9220 C+G ...648wekyb3d8bbwe\CalculatorApp.exe N/A | | 0 N/A N/A 9692 C+G ....Search_cw5n1h2txyewy\SearchApp.exe N/A | | 0 N/A N/A 10152 C+G ..._8wekyb3d8bbwe\PaintStudio.View.exe N/A | | 0 N/A N/A 10608 C+G ...oogle\Chrome\Application\chrome.exe N/A | | 0 N/A N/A 10948 C+G ...CBS_cw5n1h2txyewy\TextInputHost.exe N/A | | 0 N/A N/A 11928 C+G ..._8wekyb3d8bbwe\Microsoft.Photos.exe N/A | | 0 N/A N/A 12328 C+G ...GeForce Experience\NVIDIA Share.exe N/A | | 0 N/A N/A 12668 C+G ...siveControlPanel\SystemSettings.exe N/A | | 0 N/A N/A 12724 C+G ...GeForce Experience\NVIDIA Share.exe N/A | | 0 N/A N/A 13824 C+G ...inaries\Win64\EpicGamesLauncher.exe N/A | | 0 N/A N/A 14244 C+G ...ne\Binaries\Win64\EpicWebHelper.exe N/A | +---------------------------------------------------------------------------------------+
python generate.py --share=False --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 --score_model=None --load_4bit=True --prompt_type=human_bot
Auto set langchain_mode=Disabled Have langchain package: False
No GPUs detected
Using Model h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Could not determine --max_seq_len, setting to 2048. Pass if not correct
Could not determine --max_seq_len, setting to 2048. Pass if not correct
Could not determine --max_seq_len, setting to 2048. Pass if not correct
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [08:00<00:00, 240.42s/it]
Model {'base_model': 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'human_bot', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '
To create a public link, set share=True
in launch()
.
@3koozy , The original issue was that MAC MPS was used but print out was just wrong since really meant no CUDA GPUs.
However, you have CUDA GPUs. You should check that torch can see your GPUs, since that is all that fails in h2oGPT.
Perhaps you didn't install GPU version of torch as in the readme?
@pseudotensor
I have followed the installation steps , and used the following command to install all requirements + CUDA :
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117
but you are right , when I tried fetching gpus using pytorch , for some reason it won't recognize them:
Python 3.10.12 | packaged by Anaconda, Inc. | (main, Jul 5 2023, 19:01:18) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> available_gpus = [torch.cuda.device(i) for i in range(torch.cuda.device_count())]
>>> available_gpus
[]
Yes, that's odd. That
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117
should have ensured cuda torch was installed.
@pseudotensor
It seems that I skipped some steps , and after doing the following everything worked :)
1) installed cuda 11.8 version instead of 11.7 with
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
2) installed C++ libraries as in the readme. 3) installed MinGW as in readme. 4) installed : bitsandbytes 4-bit and 8-bit:
pip uninstall bitsandbytes -y
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.1.post1-py3-none-win_amd64.whl
and now it works fine :D
python generate.py --share=False --base_model=h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3 --score_model=None --load_4bit=True --prompt_type=human_bot
Auto set langchain_mode=Disabled Have langchain package: False
Using Model h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Starting get_model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3
Could not determine --max_seq_len, setting to 2048. Pass if not correct
Could not determine --max_seq_len, setting to 2048. Pass if not correct
Could not determine --max_seq_len, setting to 2048. Pass if not correct
device_map: {'': 0}
bin C:\Users\3koozy\anaconda3\envs\h2ogpt\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.dll
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:22<00:00, 41.18s/it]
Model {'base_model': 'h2oai/h2ogpt-gm-oasst1-en-2048-falcon-7b-v3', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'human_bot', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '<human>: ', 'PreInput': None, 'PreResponse': '<bot>:', 'terminate_response': ['\n<human>:', '\n<bot>:', '<human>:', '<bot>:', '<bot>:'], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': '<human>:', 'botstr': '<bot>:', 'generates_leading_space': True}}
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
Cool!
Hello,
I am trying to get llama2 installed on my laptop. I am using MacBook Pro, Apple M2 Max, MacOS Ventura 13.0 (22A8380). I have 32 GB unified memory. "32GB of unified memory makes everything you do fast and fluid" "12-core CPU delivers speeds up to 20 percent faster to fly through pro workflows quicker than ever" "30-core GPU with faster performance for graphics-intensive apps and games"
Memory: 32 GB Type: LPDDR5 Manufacturer: Hynix
But when I ran the following command,
python generate.py --base_model='llama' --prompt_type=llama2
the logs show there is no GPUs detectedI am thinking if GPU cannot be used, I may have to return this laptop.
Or changing this line of code https://github.com/h2oai/h2ogpt/blob/main/src/gen.py#L570 could fix it? Thanks!