Closed AlfanNasution closed 9 months ago
The only odd thing I see is this:
2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403]
As for the 7B llama-2, it's not great with documents. I recommend mistral 7b instead if using 7b.
But your command is fine, e.g.:
(h2ogpt) jon@pseudotensor:~/h2ogpt$ python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True
But as soon as I add some docs, 7b llama-2 gets quickly confused.
Do you have any idea what this message is about? 2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403].
Hi, I've never seen before. Could you try setting --gradio_offline_level=2 ? Maybe it's trying to get fonts.
But [W] should just mean it's a warning.
Perhaps your problem is more related to the doc handling of 7b? Did you try just not using any docs first?
If you already have the db, you'll need to tell it not to use it by selecting Resources->Collections->LLM.
Hello, thanks for the response, but i haven't upload any document yet to h2ogpt, and i don't make any db yet. Do you think something is wrong with the connection maybe?
So for additional information, i just want to start simple conversation with the AI, but the answer is keep on loading and finally it got the error message from the UI
Ya that's pretty messed up. We run h2oGPT on A100 all the time and no issues. So some kind of issue. I googled that warning but nothing was obvious.
Like you said before i should use mistral 7b. If i want to do it, can i just change the command running parameters with something like this? python generate.py --base_model=h2oai/7-mistral --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True --gradio_offline_level=2
python generate.py --base_model=mistralai/Mistral-7B-Instruct-v0.1 --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True --gradio_offline_level=2
Sat Oct 14 01:15:51 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Ti On | 00000000:01:00.0 Off | Off |
| 0% 58C P2 125W / 480W| 18549MiB / 24564MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 2080 On | 00000000:03:00.0 Off | N/A |
| 31% 44C P8 7W / 215W| 10MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1615 G /usr/lib/xorg/Xorg 156MiB |
| 0 N/A N/A 2258 G /usr/lib/xorg/Xorg 1515MiB |
| 0 N/A N/A 2391 G /usr/bin/gnome-shell 162MiB |
| 0 N/A N/A 5455 G /usr/bin/nvidia-settings 0MiB |
| 0 N/A N/A 5940 G ...2605455,14348786443269456662,262144 220MiB |
| 0 N/A N/A 6976 G gnome-control-center 4MiB |
| 0 N/A N/A 8291 G ...ures=SpareRendererForSitePerProcess 35MiB |
| 0 N/A N/A 1498418 C python 16426MiB |
| 1 N/A N/A 1615 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2258 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Hello, after i run the command, i straightly got this error.
python generate.py --base_model=mistralai/Mistral-7B-Instruct-v0.1 --score_model=None --langchaimode='UserData' --user_path=user_path --share=True --gradio_offline_level=2
Using Model mistralai/mistral-7b-instruct-v0.1
Starting get_model: mistralai/Mistral-7B-Instruct-v0.1
/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1020: FutureWarning: Theuse_auth_tokenargument is deprecated and will be removed in v5 of Transformers. warnings.warn( Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████| 571/571 [00:00<00:00, 1.83MB/ /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:655: FutureWarning: The
e_auth_tokenargument is deprecated and will be removed in v5 of Transformers. warnings.warn( Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 7.36MB/ Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████| 493k/493k [00:00<00:00, 3.46MB/ Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████| 1.80M/1.80M [00:00<00:00, 2.53MB/ Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████| 72.0/72.0 [00:00<00:00, 397kB/ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "/data/h2ogpt/generate.py", line 16, in <module> entrypoint_main() File "/data/h2ogpt/generate.py", line 12, in entrypoint_main H2O_Fire(main) File "/data/h2ogpt/src/utils.py", line 64, in H2O_Fire fire.Fire(component=component, command=args) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/data/h2ogpt/src/gen.py", line 1258, in main model0, tokenizer0, device = get_model(reward_type=False, File "/data/h2ogpt/src/gen.py", line 1780, in get_model return get_hf_model(load_8bit=load_8bit, File "/data/h2ogpt/src/gen.py", line 1945, in get_hf_model config, model, max_seq_len = get_config(base_model, File "/data/h2ogpt/src/gen.py", line 1384, in get_config model = AutoModel.from_config( File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_coig return model_class._from_config(config, **kwargs) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1190, in _from_config model = cls(config, **kwargs) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 775, in init__ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx) File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 142, in __init__ self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs), File "/data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_device.py", line 62, in __torch_function__ return func(*args, **kwargs) RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA` to enable device-side assertions.
Do you think that mistral need requirement library to install first? I dont see anything related to mistral model in the linux section
Hello, i'm just curious about the cuda version. Is it possible if we use cuda version 11.2 to run h2ogpt? because i notice your cuda spesification have CUDA version 12.1
cuda 11.7 is normal for h2ogpt install, but various binaries use different cuda versions. The cuda version of your toolkit won't matter if you use conda way to install the env. I'm unsure if 11.2 will work or not.
It seems like there's a CUDA-related error, indicating an issue with GPU availability or compatibility. The error message suggests that a CUDA-capable device is either busy or unavailable.
Here are a few steps you can take to troubleshoot this issue:
Check GPU Availability: Verify that your GPU is working correctly and that CUDA is installed.
Check CUDA Installation: Make sure you have installed the correct version of CUDA that matches your GPU and PyTorch version.
Check PyTorch Installation: Ensure that your PyTorch installation is compatible with your CUDA version.
GPU Memory: Check if your GPU has enough free memory to load the Mistral model.
Use CPU : If you don't have a compatible GPU or encounter issues, you can modify the script to use the CPU instead of the GPU.
os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # Use CPU
I've tried to install the correst version of CUDA that matches my GPU and PyTorch version. It seems that the PyTorch cannot detect my GPU because of the CUDA Version didn't match, the latest version of CUDA in my GPU is 11.2 so i've tried installed pyTorch=1.10.1 that support cuda 11.1 since no PyTorch version that support cuda 11.2 based on the https://pytorch.org/get-started/previous-versions/ . And when i tried to run this code, it return False import torch print(torch.cuda.is_available())
For windows, in some cases people had to do:
pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
separately and then GPU was seen, while just:
pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu117
would not give GPU as visible, but I've not seen that.
@AlfanNasution Please try with --max_seq_len=2048
or use 4096, because by default mistral will try to run with 32k context (the max from config.json) You are probably just running out of GPU memory.
Hello Team,
I run the program on RHEL 8.x, and my GPU is A100 with 20GB Memory. I follow all along the installation step based on document. Then when i run this command to launch: python generate.py --base_model=h2oai/h2ogpt-4096-llama2-7b-chat --score_model=None --langchain_mode='UserData' --user_path=user_path --share=True
The program not stopped, but also when i asked a simple question such as 'Explain yourself', it wont generate the answer. These are the log when it happened:
Using Model h2oai/h2ogpt-4096-llama2-7b-chat Starting get_model: h2oai/h2ogpt-4096-llama2-7b-chat /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1020: FutureWarning: The
use_auth_token
argument is deprecated and will be removed in v5 of Transformers. warnings.warn( /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:655: FutureWarning: Theuse_auth_token
argument is deprecated and will be removed in v5 of Transformers. warnings.warn( device_map: {'': 0} /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: Theuse_auth_token
argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:27<00:00, 43.70s/it] /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py:374: FutureWarning: Theuse_auth_token
argument is deprecated and will be removed in v5 of Transformers. warnings.warn( Model {'base_model': 'h2oai/h2ogpt-4096-llama2-7b-chat', 'tokenizer_base_model': '', 'lora_weights': '', 'inference_server': '', 'prompt_type': 'llama2', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': '[INST] ', 'PreInput': None, 'PreResponse': '[/INST]', 'terminate_response': ['[INST]', ''], 'chat_sep': ' ', 'chat_turn_sep': ' ', 'humanstr': '[INST]', 'botstr': '[/INST]', 'generates_leading_space': False, 'system_prompt': ''}, 'visible_models': None, 'h2ogpt_key': None, 'load_8bit': False, 'load_4bit': False, 'low_bit_mode': 1, 'load_half': True, 'load_gptq': '', 'load_awq': '', 'load_exllama': False, 'use_safetensors': False, 'revision': None, 'use_gpu_id': True, 'gpu_id': 0, 'compile_model': True, 'use_cache': None, 'llamacpp_dict': {'n_gpu_layers': 100, 'use_mlock': True, 'n_batch': 1024, 'n_gqa': 0, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ'}, 'model_path_llama': 'https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/resolve/main/llama-2-7b-chat.ggmlv3.q8_0.bin', 'model_name_gptj': 'ggml-gpt4all-j-v1.3-groovy.bin', 'model_name_gpt4all_llama': 'ggml-wizardLM-7B.q4_2.bin', 'model_name_exllama_if_no_config': 'TheBloke/Nous-Hermes-Llama2-GPTQ', 'rope_scaling': {}, 'max_seq_len': None, 'exllama_dict': {}} Running on local URL: http://0.0.0.0:7860 2023/10/13 16:24:50 [W] [service.go:132] login to server failed: DialTcpByHttpProxy error, StatusCode [403]Could not create share link. Please check your internet connection or our status page: https://status.gradio.app. Started Gradio Server and/or GUI: server_name: 0.0.0.0 port: None /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/gradio/helpers.py:818: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g.
return gr.Textbox(...)
instead ofreturn gr.update(...) warnings.warn( /data/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/gradio/components/radio.py:134: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g.
return gr.Radio(...)instead of
return gr.Radio.update(...). warnings.warn( The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
attention_maskto obtain reliable results. Setting
pad_token_idto
eos_token_id`:2 for open-end generation.Can you please give me some help?