How do I add a model to run it locally?

gabriead commented 1 year ago

Dear community, I have set the repo up and running on my windows machine. However I do not udnerstand how I can add a model to run it locally on my machine. I have downloaded the model 'llama-2-7b-chat.ggmlv3.q8_0.bin' and placed it into the h20GPT root folder. Then in the UI I have selcted it like so:

But if I entered anything in the prompt I get the error on the console: AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation Can anyone point me into the right direction on how to do this correctly? Thank's a lot!

pseudotensor commented 1 year ago

Did you click the top button? i.e. "Download/Load Model"

We are aware the coloring is not great. It's a limitation of gradio that buttons and info labels are same. We have some work on it: https://github.com/h2oai/h2ogpt/pull/818

gabriead commented 1 year ago

Hi @pseudotensor , jap I did that, but still same error as above

pseudotensor commented 1 year ago

I presume it's not finding the file. I haven't had issues. Are you able to diagnose?

natlamir commented 1 year ago

I am having a similar issue but with the installation from the One Click Windows GPU CUDA Installer. I can't figure out how to load a model. Do I need to place the .bin file at a specific location? When I select something from the dropdown and click Load-Unload Model / LORA button on the right, I get this error on the top right:

pseudotensor commented 1 year ago

@natlamir It seems to have trouble writing some files, probably permissions issue to the disk for where it was installed.

One can debug like this: https://github.com/h2oai/h2ogpt/issues/652#issuecomment-1675444734

i.e. using python instead of pythonx and running on windows command line termninal.

If you use llama as base_model, then you can provide GGML link from TheBloke. I give details here: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

But your issue is some permission thing, a stack trace from the command line output would help. Thanks!

natlamir commented 1 year ago

@pseudotensor Thanks for the fast reply. I tried running it through the command line to get the stack trace, and it works just fine when run through the command line! (I was using a non-elevated command prompt) Previously I was trying to run it by clicking on the icon from the Start menu on my Windows 10, and that is when it was erroring. So now I am able to download and use the model I was trying to.

I tried using the GGML link from TheBloke you mentioned. Let me know if I am missing a step for doing this through the UI.

In the Models tab, on the bottom left textbox titled "New Model name/path/URL" I enter: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin

Then click on the "Add new Model, Lora, Server url:port" button on the bottom right. This auto-populated the URL in the "Choose Base Model" dropdown at the top with the URL I entered in the textbox. Then I click the "Load-Unload Model/LORA" button on the top right, and it downloads the 7GB file, but then errors. Here is the command line output / stack trace of the error (The file it is referencing in the error in the Temp folder appears to be the model file without an extension. it is 7GB):

C:\Users\root>C:\Users\root\AppData\Local\Programs\h2oGPT\Python\python.exe "C:\Users\root\AppData\Local\Programs\h2oGPT\h2oGPT.launch.pyw" file: C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\win_run_app.py path1 C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\libnvvp;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Git\cmd;C:\Program Files (x86)\WinMerge;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA Corporation\NVIDIA NvDLISR;C:\Program Files\NVIDIA Corporation\Nsight Compute 2021.1.0\;C:\Program Files\dotnet\;C:\Users\root\AppData\Local\Programs\Python\Python310\Scripts\;C:\Users\root\AppData\Local\Programs\Python\Python310\;C:\Users\root\AppData\Local\Microsoft\WindowsApps;C:\tools\ffmpeg\bin;C:\Users\root\AppData\Local\ffmpegio\ffmpeg-downloader\ffmpeg\bin;C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\Hostx64\x64;C:\tools;C:\Users\root\AppData\Roaming\Python\Python310\Scripts;C:\Users\root\AppData\Local\Google\Cloud SDK\google-cloud-sdk\bin;C:\Users\root\AppData\Roaming\Python\Python38\Scripts;C:\Users\root.dotnet\tools;C:\MinGW\bin;C:\MinGW\mingw32\bin;;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/bin/;C:\Users\root\AppData\Local\Programs\h2oGPT\poppler/Library/lib/;C:\Users\root\AppData\Local\Programs\h2oGPT\Tesseract-OCRC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwrightC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/chromium-1076/chrome-winC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/ffmpeg-1009C:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/firefox-1422/firefoxC:\Users\root\AppData\Local\Programs\h2oGPT\ms-playwright/webkit-1883 C:\Users\root\AppData\Local\Programs\h2oGPT..\src C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src C:\Users\root\AppData\Local\Programs\h2oGPT..\iterators C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\iterators C:\Users\root\AppData\Local\Programs\h2oGPT..\gradio_utils C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\gradio_utils C:\Users\root\AppData\Local\Programs\h2oGPT..\metrics C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\metrics C:\Users\root\AppData\Local\Programs\h2oGPT..\models C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\models C:\Users\root\AppData\Local\Programs\h2oGPT... C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs... Auto set langchain_mode=LLM. Could use MyData instead. To allow UserData to pull files from disk, set user_path or langchain_mode_paths, and ensure allow_upload_to_user_data=True Prep: persist_directory=db_dir_UserData exists, using Prep: persist_directory= does not exist, regenerating Did not generate db since no sources Prep: persist_directory= does not exist, regenerating Did not generate db since no sources favicon_path1=h2o-logo.svg not found favicon_path2: h2o-logo.svg not found in C:\Users\root\AppData\Local\Programs\h2oGPT\src Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch(). Starting get_model: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\utils\hub.py:575: UserWarning: Using from_pretrained with the url of a file (here https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin) is deprecated and won't be possible anymore in v5 of Transformers. You should host your file on the Hub (hf.co) instead and use the repository ID. Note that this is not compatible with the caching system (your file will be downloaded at each execution) or multiple processes (each process will download the file in a different temporary file). warnings.warn( Downloading (…)chat.ggmlv3.q8_0.bin: 100%|█████████████████████████████████████████| 7.16G/7.16G [01:07<00:00, 107MB/s] Traceback (most recent call last): File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 659, in _get_config_dict config_dict = cls._dict_from_json_file(resolved_config_file) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 750, in _dict_from_json_file text = reader.read() File "codecs.py", line 322, in decode UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 28: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\routes.py", line 488, in run_predict output = await app.get_blocks().process_api( File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1431, in process_api result = await self.call_function( File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\blocks.py", line 1109, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\gradio\utils.py", line 706, in wrapper response = f(args, kwargs) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gradio_runner.py", line 3279, in load_model model1, tokenizer1, device1 = get_model(reward_type=False, File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1288, in getmodel config, , max_seq_len = get_config(base_model, config_kwargs, raise_exception=False) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs..\src\gen.py", line 1007, in get_config config = AutoConfig.from_pretrained(base_model, use_auth_token=use_auth_token, File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\models\auto\configuration_auto.py", line 944, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 574, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "C:\Users\root\AppData\Local\Programs\h2oGPT\pkgs\transformers\configuration_utils.py", line 662, in _get_config_dict raise EnvironmentError( OSError: It looks like the config file at 'C:\Users\root\AppData\Local\Temp\tmps1rk02tn' is not a valid JSON file.

pseudotensor commented 1 year ago

You can't pass GGML model to --base_model. See: https://github.com/h2oai/h2ogpt/blob/main/docs/FAQ.md#adding-models

For GGML, use 'llama' as base_model, then in UI you'll get more options appear. Then put in the url for the model llama name.

pseudotensor commented 1 year ago

For general offline, see updates here: https://github.com/h2oai/h2ogpt/blob/main/docs/README_offline.md from https://github.com/h2oai/h2ogpt/pull/877

chengjia604 commented 1 year ago

亲爱的社区，我已经在我的 Windows 机器上设置并运行了存储库。但是，我不明白如何添加模型以在我的计算机上本地运行它。我已下载模型“llama-2-7b-chat.ggmlv3.q8_0.bin”并将其放入 h20GPT 根文件夹中。然后在用户界面中我选择了它，如下所示：

但是，如果我在提示中输入任何内容，我会在控制台上收到错误： AssertionError: Please choose a base model with --base_model (CLI) or load in Models Tab (gradio). Then start New Conversation 任何人都可以为我指出如何正确执行此操作的正确方向吗？多谢！

Hello, has your problem been solved? I also encountered the same problem.

pseudotensor commented 1 year ago

The "base_model" is llama for that model. Once you choose "llama" another view will pop-up to enter the llama model path or url. Then you click on the "Download/Load Models" button at top. We'll try to improve the UX.

Blue-newai commented 11 months ago

Hi, i am using a ubuntu system in laptop how many times i have tried to install h2o.ai in local through terminal i have been facing lot of issues and i could't get what the mistake happens with system can help me

pseudotensor commented 11 months ago

@Blue-newai if you have a problem you should post the issue etc. I've tried to make it easier and easier to install and use.

h2oai / h2ogpt

How do I add a model to run it locally? #851