Closed Sierbahnn closed 7 months ago
I'm not sure. I am using pynist and haven't seen this issue myself, but I presume the location of the uninstall.exe is not there but somewhere else.
I have searched the entire directory, and there is no uninstall in there. And since I cannot reinstall it I cannot get a new/fresh uninstall executable. I have no idea why it got flagged as "contains a virus" this time around either.
Ok, please for now delete the contents of the h2oGPT folder, which should contain everything. Check for llamacpp_path if you have model files before deleting.
Thanks. There was stuff there. I will try a new install, and see if I can make better headway this time.
I got it uninstalled and re-installed, including the models. Way better. It is terribly slow though. I have selected -1 for the GPU, but it is still averaging only a single word every few seconds. I am going to assume that there is some setting I am missing. How do I best find out which one it is ?
Sorry for continuing this issue rather than starting another, but it is all part of my single user-experience.
Sounds like it's using CPU. Did you follow the step about installing GPU torch in the readme.md after the download link? It's not quite one-click for GPU due to that step. I had some automatic download stuff, but it's too invisible.
Also, what model were you using?
The :\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe -m pip uninstall -y torch C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe -m pip install https://h2o-release.s3.amazonaws.com/h2ogpt/torch-2.1.2%2Bcu118-cp310-cp310-win_amd64.whl
?
Yes, I completed those steps. I got confirmations when I ran them. Is there a way for me to verify that they ran correctly? I posted two screenshots over on the Discord
https://discord.com/channels/1097462770674438174/1111483879723900978/1205527735980789791
The step says "pip uninstall -y torch" - not install. Is that right? I know nothing about the syntax here, so I just wanted to verify.
Yes, that's right. Sounds like you did the steps. If they didn't complain, and things seem to be in that location, then should be right.
Can you please do the "debug" way mentioned thtere?
Specifically:
C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\pseud\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw"
this type of line.
It should print out how many GPUs seen by torch etc.
Even better is if you can do:
set H2OGPT_VERBOSE=True
C:\Users\pseud\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\pseud\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw"
This will show in console what is happening.
The first debug gives this output "Torch Status: have torch: True need get gpu torch: False CVD: None GPUs: 1 Fontconfig error: Cannot load default config file: No such file: (null) load INSTRUCTOR_Transformer max_seq_length 512 D:\Program Files (x86)\h2oGPT\pkgs\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)"
The complete text is
"D:\Program Files (x86)\h2oGPT\Python>python.exe "D:\Program Files (x86)\h2oGPT\h2oGPT.launch.pyw" file: D:\Program Files (x86)\h2oGPT\pkgs\win_run_app.py path1 D:\Program Files (x86)\h2oGPT\pkgs C:\Program Files (x86)\Common Files\Oracle\Java\javapath;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\windows\System32\OpenSSH\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Android;C:\Windows\System32;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\Docker\Docker\resources\bin;C:\ProgramData\DockerDesktop\version-bin;C:\Program Files\dotnet\;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files (x86)\Common Files\Acronis\SnapAPI\;C:\Program Files (x86)\Common Files\Acronis\VirtualFile\;C:\Program Files (x86)\Common Files\Acronis\VirtualFile64\;C:\Program Files (x86)\Common Files\Acronis\FileProtector\;C:\Program Files (x86)\Common Files\Acronis\FileProtector64\;C:\Users\sierb\AppData\Local\Microsoft\WindowsApps;d:\Program Files (x86)\h2oGPT\Python\Scripts;D:\Program Files (x86)\h2oGPT\poppler/Library/bin/;D:\Program Files (x86)\h2oGPT\poppler/Library/lib/;D:\Program Files (x86)\h2oGPT\Tesseract-OCR;D:\Program Files (x86)\h2oGPT\ms-playwright;D:\Program Files (x86)\h2oGPT\ms-playwright/chromium-1076/chrome-win;D:\Program Files (x86)\h2oGPT\ms-playwright/ffmpeg-1009;D:\Program Files (x86)\h2oGPT\ms-playwright/firefox-1422/firefox;D:\Program Files (x86)\h2oGPT\ms-playwright/webkit-1883;D:\Program Files (x86)\h2oGPT\rubberband/ D:\Program Files (x86)\h2oGPT..\src D:\Program Files (x86)\h2oGPT\pkgs..\src D:\Program Files (x86)\h2oGPT..\iterators D:\Program Files (x86)\h2oGPT\pkgs..\iterators D:\Program Files (x86)\h2oGPT..\gradio_utils D:\Program Files (x86)\h2oGPT\pkgs..\gradio_utils D:\Program Files (x86)\h2oGPT..\metrics D:\Program Files (x86)\h2oGPT\pkgs..\metrics D:\Program Files (x86)\h2oGPT..\models D:\Program Files (x86)\h2oGPT\pkgs..\models D:\Program Files (x86)\h2oGPT... D:\Program Files (x86)\h2oGPT\pkgs... Torch Status: have torch: True need get gpu torch: False CVD: None GPUs: 1 Fontconfig error: Cannot load default config file: No such file: (null) load INSTRUCTOR_Transformer max_seq_length 512 D:\Program Files (x86)\h2oGPT\pkgs\pydub\utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Begin auto-detect HF cache text generation models No loading model microsoft/speecht5_hifigan because 'hifigan' No loading model microsoft/speecht5_tts because is_encoder_decoder=True No loading model openai/whisper-base.en because is_encoder_decoder=True End auto-detect HF cache text generation models Begin auto-detect llama.cpp models End auto-detect llama.cpp models favicon_path1=h2o-logo.svg not found favicon_path2: h2o-logo.svg not found in D:\Program Files (x86)\h2oGPT\src Running on local URL: http://0.0.0.0:7860
To create a public link, set share=True
in launch()
.
Started Gradio Server and/or GUI: server_name: localhost port: None
Use local URL: http://localhost:7860/
D:\Program Files (x86)\h2oGPT\pkgs\pydantic_internal_fields.py:149: UserWarning: Field "modelname" has conflict with protected namespace "model".
You may be able to resolve this warning by setting model_config['protected_namespaces'] = ()
.
warnings.warn(
D:\Program Files (x86)\h2oGPT\pkgs\pydantic_internal_fields.py:149: UserWarning: Field "modelnames" has conflict with protected namespace "model".
You may be able to resolve this warning by setting model_config['protected_namespaces'] = ()
.
warnings.warn(
OpenAI API URL: http://0.0.0.0:5000
INFO:name:OpenAI API URL: http://0.0.0.0:5000
OpenAI API key: EMPTY
INFO:name:OpenAI API key: EMPTY
INFO: 127.0.0.1:56320 - "GET / HTTP/1.1" 200 OK
INFO: 127.0.0.1:56324 - "GET /info HTTP/1.1" 200 OK
INFO: 127.0.0.1:56324 - "GET /theme.css HTTP/1.1" 200 OK
INFO: 127.0.0.1:56332 - "GET /assets/index-de90ac30.js HTTP/1.1" 200 OK
Starting get_model: HuggingFaceH4/zephyr-7b-beta
device_map: {'': 0}
bin D:\Program Files (x86)\h2oGPT\pkgs\bitsandbytes\libbitsandbytes_cuda118.dll
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 8/8 [00:24<00:00, 3.00s/it]
INFO: 127.0.0.1:56434 - "POST /run/predict HTTP/1.1" 200 OK
INFO: 127.0.0.1:56505 - "GET /file%3DD%3A/Program%20Files%20%28x86%29/h2oGPT/models/human.jpg HTTP/1.1" 403 Forbidden
INFO: 127.0.0.1:56506 - "GET /file%3DD%3A/Program%20Files%20%28x86%29/h2oGPT/models/h2oai.png HTTP/1.1" 403 Forbidden
INFO: 127.0.0.1:56506 - "GET /file%3DD%3A/Program%20Files%20%28x86%29/h2oGPT/models/h2oai.png HTTP/1.1" 403 Forbidden
INFO: 127.0.0.1:56505 - "GET /file%3DD%3A/Program%20Files%20%28x86%29/h2oGPT/models/human.jpg HTTP/1.1" 403 Forbidden
INFO: 127.0.0.1:56506 - "GET /file%3DD%3A/Program%20Files%20%28x86%29/h2oGPT/models/h2oai.png HTTP/1.1" 403 Forbidden
INFO: 127.0.0.1:56518 - "GET /stream/9skclnxt9n7/1882865140688/249 HTTP/1.1" 200 OK"
So that looks like it is finding a GPU
Yes, seems so. Then what does it look like when you do a query? Are you doing a pure LLM call or with documents? And what are your GPU specs?
In general I recommend GGUF for less beefy GPUs. Can you try the zephyr but GGUF version?
Oh I tried, but it is not doing the doing
It has been loading a while now. I restarted and tried again, but it is still aggressively loading the model. And I have a NVIDIA GeForce RTX 3060 with 12 gb vRAM and 32 gb onboard RAM on the mb.
This now loaded quickly enough, and while text-generation is still pretty slow, it is at least not a word every few seconds. Is there anything I can log out or something to show you how the machine is working through the query? I am running pure LLM (have not attempted the document uploads and stuff yet), as I would like this to run well first, before attempting to tack on anything else.
The loading means downloading from internet. It's like 4GB so it may take a while. In console it would show progress bar. I need to pass that progress up to gradio for better UX
Can you try llama.cpp directly with same model and see how fast it goes? It should be same speed as h2oGPT.
I very much doubt that I can. The install-process seems too complicated for me to manage without significant time-investment. I am stuck on the wrong side of the linux/windows-fence and a lot of the terms and uses here are beyond me.
Got the LM studio set up and it is running the same model. It is slightly quicker, I would say 30-40% quicker.
Yes, so I have been switching back and forth a bit, and when asked similar questions the LM Studio is roughly twice as fast at just typing out the answers compared to h2oGPT. Using the same model. So I am guessing there are either errors in h2oGPT (which I doubt, seeing as others do not have these issues) or I am missing a setting somewhere. Not sure how to troubleshoot that though, so any input would be invaluable.
h2oGPT batches the output so it can handle high concurrency (i.e. hundreds of users), unsure if that may be what you are noticing.
You can disable by setting --gradio_ui_stream_chunk_size=0
or in the win_run_app.py file set:
os.environ['gradio_ui_stream_chunk_size'] = '0'
just inside _main()
.
The win_run_app.py
file is located in 2 places for reasons out of my control, the one inside h2oGPT folder is I think wrong one, and one deeper inside is right one, but I don't have windows up at moment to check.
But I can check out lm studio sometime and compare too.
Am I missing something here? Why would it need to handle high concurrency when it is running offline on my hardware?
I will try these settings, and see if I can manage it. It is not exactly as simple as a tickbox in a setting-window. :)
Yes, that made a difference. It is much more snappy now.
@Sierbahnn Yes, it's good point, would be nicer if some of these options were dynamically settable in UI.
Yes, that made a difference. It is much more snappy now.
In an attempt to make a fresh install of the client to resolve some issues I had when installing, I wanted to uninstall the client. This turned out to be tricky.
First - the unistaller is not there
So the shortcut from my apps-menu does not work. And there is no file I can manually click either.
So I figured I could try to simply run another install - maybe that will throw up an uninstall/repair-option? No, not really.
How do I go about uninstalling the client so I can do a fresh install?