h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.24k stars 1.23k forks source link

No answer received to a simple question #956

Closed AlfanNasution closed 9 months ago

AlfanNasution commented 11 months ago

Hello Team,

I run the program on RHEL 8.x, and my GPU is A100 with 20GB Memory. I follow all along the installation step based on document. Then when i run this command to launch: python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True

The program not stopped, but also when i asked a simple question such as 'Explain yourself', it wont generate the answer. These are the log when it happened:

Using Model h2oai/h2ogpt-4096-llama2-7b-chat Starting get_model: h2oai/h2ogpt-4096-llama2-7b-chat /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1020: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. warnings.warn( /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:655: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. warnings.warn( device_map: {'': 0} /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:27<00:00, 43.70s/it] /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py:374: FutureWarning: The use_auth_token argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Model {'base_model': 'h2oai/h2ogpt-4096-llama2-7b-chat', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '[INST] ', 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', ''], 'chat_sep': ' ', 'chat_turn_sep': ' ', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': ''}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': True, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ', 'rope_scaling': {}, 'max_seq_len': None, 'exllama_dict': {}} Running on local URL: http://0.0.0.0:7860 2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403]

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app. Started Gradio Server and/or GUI: server_name: 0.0.0.0 port: None /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. return gr.Textbox(...) instead of return gr.update(...) warnings.warn( /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/gradio/components/radio.py:134: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g.return gr.Radio(...)instead ofreturn gr.Radio.update(...). warnings.warn( The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input'sattention_maskto obtain reliable results. Settingpad_token_idtoeos_token_id`:2 for open-end generation.

Can you please give me some help?

pseudotensor commented 11 months ago

The only odd thing I see is this:

2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403]

As for the 7B llama-2, it's not great with documents. I recommend mistral 7b instead if using 7b.

But your command is fine, e.g.:

(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True

image

But as soon as I add some docs, 7b llama-2 gets quickly confused.

AlfanNasution commented 11 months ago

Do you have any idea what this message is about? 2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403].

pseudotensor commented 11 months ago

Hi, I've never seen before. Could you try setting --gradio_offline_level=2 ? Maybe it's trying to get fonts.

But [W] should just mean it's a warning.

pseudotensor commented 11 months ago

Perhaps your problem is more related to the doc handling of 7b? Did you try just not using any docs first?

If you already have the db, you'll need to tell it not to use it by selecting Resources->Collections->LLM.

AlfanNasution commented 11 months ago

Hello, thanks for the response, but i haven't upload any document yet to h2ogpt, and i don't make any db yet. Do you think something is wrong with the connection maybe?

AlfanNasution commented 11 months ago

So for additional information, i just want to start simple conversation with the AI, but the answer is keep on loading and finally it got the error message from the UI image

pseudotensor commented 11 months ago

Ya that's pretty messed up. We run h2oGPT on A100 all the time and no issues. So some kind of issue. I googled that warning but nothing was obvious.

AlfanNasution commented 11 months ago

Like you said before i should use mistral 7b. If i want to do it, can i just change the command running parameters with something like this? python generate.py --base_model=h2oai/7-mistral --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True --gradio_offline_level=2

pseudotensor commented 11 months ago
python generate.py --base_model=mistralai/Mistral-7B-Instruct-v0.1 --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True --gradio_offline_level=2

image

Sat Oct 14 01:15:51 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090 Ti      On | 00000000:01:00.0 Off |                  Off |
|  0%   58C    P2              125W / 480W|  18549MiB / 24564MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 2080         On | 00000000:03:00.0 Off |                  N/A |
| 31%   44C    P8                7W / 215W|     10MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1615      G   /usr/lib/xorg/Xorg                          156MiB |
|    0   N/A  N/A      2258      G   /usr/lib/xorg/Xorg                         1515MiB |
|    0   N/A  N/A      2391      G   /usr/bin/gnome-shell                        162MiB |
|    0   N/A  N/A      5455      G   /usr/bin/nvidia-settings                      0MiB |
|    0   N/A  N/A      5940      G   ...2605455,14348786443269456662,262144      220MiB |
|    0   N/A  N/A      6976      G   gnome-control-center                          4MiB |
|    0   N/A  N/A      8291      G   ...ures=SpareRendererForSitePerProcess       35MiB |
|    0   N/A  N/A   1498418      C   python                                    16426MiB |
|    1   N/A  N/A      1615      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      2258      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+
AlfanNasution commented 11 months ago

Hello, after i run the command, i straightly got this error.

python generate.py --base_model=mistralai/Mistral-7B-Instruct-v0.1 --score_model=None --langchaimode='UserData' --user_path=user_path --share=True --gradio_offline_level=2 Using Model mistralai/mistral-7b-instruct-v0.1 Starting get_model: mistralai/Mistral-7B-Instruct-v0.1 /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1020: FutureWarning: Theuse_auth_tokenargument is deprecated and will be removed in v5 of Transformers. warnings.warn( Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 1.83MB/ /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:655: FutureWarning: Thee_auth_tokenargument is deprecated and will be removed in v5 of Transformers. warnings.warn( Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 7.36MB/ Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 3.46MB/ Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 2.53MB/ Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 397kB/ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/data/h2ogpt/generate.py", line 16, in <module> entrypoint_main() File "/data/h2ogpt/generate.py", line 12, in entrypoint_main H2O_Fire(main) File "/data/h2ogpt/src/utils.py", line 64, in H2O_Fire fire.Fire(component=component, command=args) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/data/h2ogpt/src/gen.py", line 1258, in main model0, tokenizer0, device = get_model(reward_type=False, File "/data/h2ogpt/src/gen.py", line 1780, in get_model return get_hf_model(load_8bit=load_8bit, File "/data/h2ogpt/src/gen.py", line 1945, in get_hf_model config, model, max_seq_len = get_config(base_model, File "/data/h2ogpt/src/gen.py", line 1384, in get_config model = AutoModel.from_config( File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_coig return model_class._from_config(config, **kwargs) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in _from_config model = cls(config, **kwargs) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 775, in init__ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 142, in __init__ self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs), File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_device.py", line 62, in __torch_function__ return func(*args, **kwargs) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.

Do you think that mistral need requirement library to install first? I dont see anything related to mistral model in the linux section

AlfanNasution commented 11 months ago

Hello, i'm just curious about the cuda version. Is it possible if we use cuda version 11.2 to run h2ogpt? because i notice your cuda spesification have CUDA version 12.1

pseudotensor commented 11 months ago

cuda 11.7 is normal for h2ogpt install, but various binaries use different cuda versions. The cuda version of your toolkit won't matter if you use conda way to install the env. I'm unsure if 11.2 will work or not.

Shasvinth commented 11 months ago

It seems like there's a CUDA-related error, indicating an issue with GPU availability or compatibility. The error message suggests that a CUDA-capable device is either busy or unavailable.

Here are a few steps you can take to troubleshoot this issue:

Check GPU Availability: Verify that your GPU is working correctly and that CUDA is installed.

Check CUDA Installation: Make sure you have installed the correct version of CUDA that matches your GPU and PyTorch version.

Check PyTorch Installation: Ensure that your PyTorch installation is compatible with your CUDA version.

GPU Memory: Check if your GPU has enough free memory to load the Mistral model.

Use CPU : If you don't have a compatible GPU or encounter issues, you can modify the script to use the CPU instead of the GPU.

os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # Use CPU

AlfanNasution commented 11 months ago

I've tried to install the correst version of CUDA that matches my GPU and PyTorch version. It seems that the PyTorch cannot detect my GPU because of the CUDA Version didn't match, the latest version of CUDA in my GPU is 11.2 so i've tried installed pyTorch=1.10.1 that support cuda 11.1 since no PyTorch version that support cuda 11.2 based on the https://pytorch.org/get-started/previous-versions/ . And when i tried to run this code, it return False import torch print(torch.cuda.is_available()) image

pseudotensor commented 11 months ago

For windows, in some cases people had to do:

pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117

separately and then GPU was seen, while just:

pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117

would not give GPU as visible, but I've not seen that.

pseudotensor commented 11 months ago

@AlfanNasution Please try with --max_seq_len=2048 or use 4096, because by default mistral will try to run with 32k context (the max from config.json) You are probably just running out of GPU memory.