h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.19k stars 1.23k forks source link

random assertion errors due to evaluate_nochat #1600

Open Blacksuan19 opened 4 months ago

Blacksuan19 commented 4 months ago

when using the docker image, I randomly get assertion errors when making a request from the gradio UI, sometimes it works and sometimes it does not, here is the raised error.

this occurs with the latest two docker images tagged 4059a2c9 and 7297519c.

Full error ```python thread exception: Traceback (most recent call last): File "/workspace/src/utils.py", line 502, in run self._return = self._target(*self._args, **self._kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__ return self.invoke( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call output, extra_return_dict = self.combine_docs( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__ return self.invoke( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call response = self.generate([inputs], run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate return self.llm.generate_prompt( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate output = self._generate_helper( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper self._generate( File "/workspace/src/gpt_langchain.py", line 2339, in _generate rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate responses = self.pipeline( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__ return super().__call__(text_inputs, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__ outputs = list(final_iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__ item = next(self.iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__ processed = self.infer(item, **self.params) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/workspace/src/h2oai_pipeline.py", line 271, in _forward return self.__forward(model_inputs, **generate_kwargs) File "/workspace/src/h2oai_pipeline.py", line 309, in __forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward outputs = self.model( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 127, in forward h, _, _ = layer( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 123, in forward attn_output, _, past_key_value = self.attn.forward( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 235, in forward xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 60, in forward freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast assert freqs_cis.shape == (x.shape[1], x.shape[-1]) AssertionError make stop: Traceback (most recent call last): File "/workspace/src/utils.py", line 502, in run self._return = self._target(*self._args, **self._kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__ return self.invoke( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call output, extra_return_dict = self.combine_docs( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs return self.llm_chain.predict(callbacks=callbacks, **inputs), {} File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict return self(kwargs, callbacks=callbacks)[self.output_key] File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__ return self.invoke( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke self._call(inputs, run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call response = self.generate([inputs], run_manager=run_manager) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate return self.llm.generate_prompt( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate output = self._generate_helper( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper raise e File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper self._generate( File "/workspace/src/gpt_langchain.py", line 2339, in _generate rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate responses = self.pipeline( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__ return super().__call__(text_inputs, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__ outputs = list(final_iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__ item = next(self.iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__ processed = self.infer(item, **self.params) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/workspace/src/h2oai_pipeline.py", line 271, in _forward return self.__forward(model_inputs, **generate_kwargs) File "/workspace/src/h2oai_pipeline.py", line 309, in __forward generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate result = self._greedy_search( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search outputs = self( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward outputs = self.model( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 127, in forward h, _, _ = layer( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 123, in forward attn_output, _, past_key_value = self.attn.forward( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 235, in forward xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 60, in forward freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast assert freqs_cis.shape == (x.shape[1], x.shape[-1]) AssertionError hit stop evaluate_nochat exception: : ('', '', '', True, 'open_chat', "{ 'PreInput': None,\n 'PreInstruct': 'GPT4 User: ',\n 'PreResponse': 'GPT4 Assistant:',\n 'botstr': 'GPT4 Assistant:',\n 'can_handle_system_prompt': False,\n 'chat_sep': '<|end_of_turn|>',\n 'chat_turn_sep': '<|end_of_turn|>',\n 'generates_leading_space': False,\n 'humanstr': 'GPT4 User: ',\n 'promptA': '',\n 'promptB': '',\n 'system_prompt': '',\n 'terminate_response ': ['GPT4 Assistant:', '<|end_of_turn|>']}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0.0, True, '', '', 'UserData', True, 'Query', [], 10, True, 512, 'Relevant', ['/workspace/user_path/9b999f43-2ade-4148-97cf-d2448125168c/r es/e6a9ce98_user_upload_protocols.pdf'], [], [], [], [], 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'According to only the information in the docume nt sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text.', 'Using only the inform ation in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to use that answer as better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is strictly valid JSON text.', 'Ensure your entir e response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier. Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanumerics, underscores , or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['OCR', 'DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], [], '', False, '[]', '[]', 'best_near_prompt', 51 2, -1.0, -1.0, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, [], 1.0, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cuda', 'base_model': 'TheBloke/openchat_3.5-16k- AWQ', 'tokenizer_base_model': '', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'open_chat', 'prompt_dict': {'promptA': '', 'promptB': '', 'PreInstruct': 'GPT4 User: ', 'PreInput': None, 'PreResponse': 'GPT4 Assistant: ', 'terminate_response': ['GPT4 Assistant:', '<|end_of_turn|>'], 'chat_sep': '<|end_of_turn|>', 'chat_turn_sep': '<|end_of_turn|>', 'humanstr': 'GPT4 User: ', 'botstr': 'GPT4 Assistant:', 'generates_leading_space': False, 'system_promp t': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, '90711427-650d-458e-ac69-bc1629b452be', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'UserData'], 'langchain_mode_paths': {'Us erData': '/workspace/user_path/'}, 'langchain_mode_types': {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'Upgrade', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/ 537.36 Edg/124.0.0.0', 'upgrade': 'websocket', 'origin': 'http://0.0.0.0:7850', 'sec-websocket-version': '13', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8p y5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZDSCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43 hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yOZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlf gUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-token-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ', 'sec-websoc ket-key': 'CS4lXFJi7AM2jwkdWyhKyQ==', 'sec-websocket-extensions': 'permessage-deflate; client_max_window_bits', 'host2': '14.1.206.49', 'picture': 'None'}, {}, [['summarize the given document', '']]) ```
Docker command used command to run h2ogpt ```bash export CONTEXT_LENGTH=16384 export IMAGE_TAG=7297519c docker run \ --init \ --gpus all \ --runtime=nvidia \ --shm-size=2g \ -p 7850:7860 \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -u $(id -u):$(id -g) \ gcr.io/vorvan/h2oai/h2ogpt-runtime:$IMAGE_TAG /workspace/generate.py \ --page_title="GenNet AI" \ --favicon_path="/workspace/assets/gennet_logo.svg" \ --height=700 \ --gradio_size="medium" \ --enable_heap_analytics=False \ --document_choice_in_sidebar=True \ --actions_in_sidebar=True \ --openai_server=False \ --use_gpu_id=False \ --score_model=None \ --prompt_type=open_chat \ --base_model=TheBloke/openchat_3.5-16k-AWQ \ --compile_model=True \ --use_cache=True \ --use_flash_attention_2=True \ --attention_sinks=True \ --sink_dict="{'num_sink_tokens': 4, 'window_length': $CONTEXT_LENGTH }" \ --save_dir='/workspace/save/' \ --user_path='/workspace/user_path/' \ --langchain_mode="UserData" \ --langchain_modes="['UserData', 'LLM']" \ --visible_langchain_actions="['Query']" \ --visible_langchain_agents="[]" \ --use_llm_if_no_docs=True \ --max_seq_len=$CONTEXT_LENGTH \ --enable_ocr=True \ --enable_tts=False \ --enable_stt=False ```
pseudotensor commented 4 months ago

Hi, I see the issue is from awq:

File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 48, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])

It's likely a bug in awq, perhaps when combined with attention sinks, flash attention, or compile of model. While we expose those options from transformers, I cannot be sure arbitrary combinations work.

If I run:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--compile_model=True \
--use_cache=True \
--use_flash_attention_2=True \
--attention_sinks=True \
--sink_dict="{'num_sink_tokens': 4, 'window_length': 16384 }" \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True

I don't have a generic issue running. I removed things that shouldn't be relevant to the awq issue.

image

However, when I upload some text and then ask a question, I get the same issue:

/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:117: LangChainDeprecationWarning: The function `__call__` was deprecated in LangChain 0.1.0 and will be removed in 0.2.0. Use invoke instead.
  warn_deprecated(
thread exception: Traceback (most recent call last):
  File "/home/jon/h2ogpt/src/utils.py", line 502, in run
    self._return = self._target(*self._args, **self._kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 137, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 244, in combine_docs
    return self.llm_chain.predict(callbacks=callbacks, **inputs), {}
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 293, in predict
    return self(kwargs, callbacks=callbacks)[self.output_key]
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/_api/deprecation.py", line 145, in warning_emitting_wrapper
    return wrapped(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 378, in __call__
    return self.invoke(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 163, in invoke
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/base.py", line 153, in invoke
    self._call(inputs, run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 103, in _call
    response = self.generate([inputs], run_manager=run_manager)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain/chains/llm.py", line 115, in generate
    return self.llm.generate_prompt(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 633, in generate_prompt
    return self.generate(prompt_strings, stop=stop, callbacks=callbacks, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 803, in generate
    output = self._generate_helper(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 670, in _generate_helper
    raise e
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_core/language_models/llms.py", line 657, in _generate_helper
    self._generate(
  File "/home/jon/h2ogpt/src/gpt_langchain.py", line 2339, in _generate
    rets = super()._generate(prompts, stop=stop, run_manager=run_manager, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/langchain_community/llms/huggingface_pipeline.py", line 267, in _generate
    responses = self.pipeline(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 240, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1223, in __call__
    outputs = list(final_iterator)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 125, in __next__
    processed = self.infer(item, **self.params)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1149, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/jon/h2ogpt/src/h2oai_pipeline.py", line 271, in _forward
    return self.__forward(model_inputs, **generate_kwargs)
  File "/home/jon/h2ogpt/src/h2oai_pipeline.py", line 309, in __forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 1576, in generate
    result = self._greedy_search(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/utils.py", line 2494, in _greedy_search
    outputs = self(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1158, in forward
    outputs = self.model(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/model.py", line 119, in forward
    h, _, past_key_value = layer(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/block.py", line 113, in forward
    attn_output, _, past_key_value = self.attn.forward(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 210, in forward
    xq, xk = self.rope.forward(xq, xk, self.start_pos, seqlen)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 62, in forward
    freqs_cis = self.reshape_for_broadcast(freqs_cis, xq_).to(xq_.device)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/awq/modules/fused/attn.py", line 50, in reshape_for_broadcast
    assert freqs_cis.shape == (x.shape[1], x.shape[-1])
AssertionError
pseudotensor commented 4 months ago

This does the same thing:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--attention_sinks=True \
--sink_dict="{'num_sink_tokens': 4, 'window_length': 16384 }" \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True
pseudotensor commented 4 months ago

As does this:

python generate.py \
--enable_heap_analytics=False \
--document_choice_in_sidebar=True \
--actions_in_sidebar=True \
--openai_server=False \
--use_gpu_id=False \
--score_model=None \
--prompt_type=open_chat \
--base_model=TheBloke/openchat_3.5-16k-AWQ \
--use_llm_if_no_docs=True \
--max_seq_len=16384 \
--enable_ocr=True

So it seems to be a pure awq issue.

The latest 0.2.5 does the same thing. Reducing to (say) 15000 does same thing.

pseudotensor commented 4 months ago

A small script does the same thing, so it's not related to h2oGPT itself.

pseudotensor commented 4 months ago

https://github.com/casper-hansen/AutoAWQ/issues/472

Blacksuan19 commented 4 months ago

I'm getting a similar error with LLAMA-3 GGUF as well (same model mentioned in the FAQ), it only includes the evaluate_nochat exception from the log above.

error running LLAMA-3 ```python evaluate_nochat exception: : ('', '', '', True, 'unknown', "{ 'PreInput': None,\n 'PreInstruct': None,\n 'PreResponse': None,\n 'botstr': None,\n 'can_handle_system_prompt': False,\n 'chat_sep': '\\n',\n 'chat_turn_ sep': '\\n',\n 'generates_leading_space': False,\n 'humanstr': None,\n 'promptA': None,\n 'promptB': None,\n 'system_prompt': '',\n 'terminate_response': []}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, Tr ue, '', '', 'LLM', True, 'Query', None, 10, True, 512, 'Relevant', ['All'], None, None, None, None, 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'Acco rding to only the information in the document sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to t he following text.', 'Using only the information in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to use that answer as better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is str ictly valid JSON text.', 'Ensure your entire response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier. Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanumerics, underscores, or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], None, '', False, '[]', '[]', 'best_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, None, 1, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cpu', 'base_model': 'llama', 'tokenizer_base_model': '', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'unknown', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse': None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, '6164574a-865b-4659-8fda-d35faa6a1d09', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'MyData', 'UserData'], 'langchain_mode_paths': {'UserData': None}, 'langchain_mode_types': {'UserData': 'shared', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'MyData': 'personal', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'keep-alive', 'content-length': '117', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0', 'dnt': '1', 'content-type': 'application/json', 'accept': '*/*', 'origin': 'http://0.0.0.0:7850', 'referer': 'http://0.0.0.0:7850/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8py5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZDSCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yOZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlfgUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-token-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ; access-token-unsecure-JSzZdmZ5Fn4S9ekIB_5lXnXTrnwvQu1X7IyivtmRjuk=mm2CzLGIw9b3H9xFfS1KpQ; access-token-unsecure-IxTC1FBXOKLvW0SXsNRzMYxrHxvTPTIwwB4y69dHG9A=fYRfZinU_x99RK1k11fvIA; access-token-unsecure-eJjGwBq3ju0P30aflFl-P8uUU2QqEgAIsxgw-FsdJgU=fDM1Je-16fNS5ndkdNRL6g; access-token-unsecure-uiZ2ybGZZJgrCTxTV22rFbgZVNssg73T8oL2xiUqp1I=VrTDb6L_l-gQy27srKSq0Q; access-token-unsecure-U2bTOnaNSeVnj1LFMlrf1Mtm6lWYkZALH_MqD44PYNU=lt481IiB7f9YGqWkkOinIA; access-token-unsecure-RxkNbfFX2paq-I07CliUNsj55vZP1qWOWwp2u-TDNCc=mmNdzgtQmfvlDlQe6i4PvA; access-token-unsecure-zfJ9zz3Jn9sTfeKgGIdQFP7gIY4kjYQ8rW5jEkOaylQ=IsBrbZibgh9mmFvTDmEOAg; access-token-unsecure-L9Xt7VY0kFHSJqZAK8-I90p4rp6XNVtxCYJp3nAmxfs=LZu9e7Ggvl2vYBE1AuZbLQ', 'host2': '14.1.246.124', 'picture': 'None'}, {}, [['hello', '']]) Traceback (most recent call last): File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/queueing.py", line 566, in process_events response = await route_utils.call_process_api( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api result = await self.call_function( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function prediction = await utils.async_iteration(iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 595, in async_iteration return await iterator.__anext__() File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 588, in __anext__ return await anyio.to_thread.run_sync( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 571, in run_sync_iterator_async return next(iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 754, in gen_wrapper response = next(iterator) File "/workspace/src/gradio_runner.py", line 5053, in bot for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1, File "/workspace/src/gradio_runner.py", line 4948, in get_response for output_fun in fun1(): File "/workspace/src/gen.py", line 4402, in evaluate prompt_basic = prompter.generate_prompt(data_point, context_from_history=False) File "/workspace/src/prompter.py", line 1729, in generate_prompt assert self.use_chat_template AssertionError ```
run command ```bash export IMAGE_TAG=4059a2c9 export HF_TOKEN=hf_xxx docker run \ --init \ --gpus all \ --runtime=nvidia \ --shm-size=2g \ -p 7850:7860 \ -v /etc/passwd:/etc/passwd:ro \ -v /etc/group:/etc/group:ro \ -u $(id -u):$(id -g) \ gcr.io/vorvan/h2oai/h2ogpt-runtime:$IMAGE_TAG /workspace/generate.py \ --openai_server=False \ --auth="/workspace/auth/users.json" \ --h2ogpt_api_keys="/workspace/auth/api_keys.json" \ --use_gpu_id=False \ --score_model=None \ --base_model=llama \ --model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf?download=true --tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct \ --save_dir='/workspace/save/' \ --user_path='/workspace/user_path/' \ --langchain_mode="UserData" \ --langchain_modes="['UserData', 'LLM']" \ --visible_langchain_actions="['Query']" \ --visible_langchain_agents="[]" \ --use_llm_if_no_docs=True \ --enable_ocr=True \ --enable_tts=False \ --enable_stt=False ```
pseudotensor commented 4 months ago

It should fail and make you pass:

--max_seq_len=8192

as well.

e.g.

python generate.py --openai_server=False --score_model=None --base_model=llama --model_path_llama=https://huggingface.co/QuantFactory/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct.Q5_K_M.gguf?download=true --tokenizer_base_model=meta-llama/Meta-Llama-3-8B-Instruct --use_llm_if_no_docs=True --max_seq_len=8192

gives:

image

I don't see the error you see. And when I debug the code with the above command on latest h2oGPT, I see that chat_template is True.

Perhaps you are using older docker image or older h2oGPT or something?

Blacksuan19 commented 4 months ago

there was a missing slash in the command after --model_path_llama, fixed now, however I'm still unable to run LLAMA-3 with the docker image due to it being a gated model, I tried passing --use_auth_token=hf_xxx and setting environment variables HUGGING_FACE_HUB_TOKEN and HF_TOKEN and HUGGINGFACE_TOKEN but still cannot access.

this occurs on both latest and previous docker images with tags c25144e9 and 4059a2c9

pseudotensor commented 4 months ago

Can you share a stack trace of where it's failing?

Blacksuan19 commented 4 months ago

here is the full stack trace, I'm authenticated on the huggingface-cli, have exported HUGGING_FACE_HUB_TOKEN in the environment and passing --use_auth_token to docker run

the error is not raised while starting the container, only after sending a query is the error raised.

```python evaluate_nochat exception: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct. 401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5) Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json. Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.: ('', '', '', True, 'unknown', "{ 'PreInput': None,\n 'PreInstruct': None,\n 'PreResponse': None,\n 'botstr': None,\n 'can_handle_system_prompt': False,\n 'chat_sep': '\\n',\n 'chat_turn_sep': '\\n',\n 'generates_leading_space': False,\n 'humanstr': None,\n 'promptA': None,\n 'promptB': None,\n 'system_prompt': '',\n 'termi nate_response': []}", 0, 1, 1, 0, 1, 1024, 0, False, 600, 1.07, 1, False, 0, True, '', '', 'UserData', True, 'Query', None, 10, True, 512, 'Relevant', ['/workspace/user_path/9b999f43-2ade-4148-97cf-d2448125168c/res/b1a173b2_user_upload _Abubakar-Yusif-March-2024-Progress-Report.pdf'], None, None, None, None, 'Pay attention and remember the information below, which will help to answer the question or imperative after the context ends.', 'According to only the informat ion in the document sources provided within the context above, write an insightful and well-structured response to: ', 'In order to write a concise single-paragraph or bulleted list summary, pay attention to the following text.', 'Usin g only the information in the document sources above, write a condensed and concise summary of key results (preferably as about 10 bullet points).', 'Answer this question with vibrant details in order for some NLP embedding model to us e that answer as better query than original question: ', 'Who are you and what do you do?', 'Ensure your entire response is outputted as a single piece of strict valid JSON text.', 'Ensure your response is strictly valid JSON text.', ' Ensure your entire response is outputted as strict valid JSON text inside a Markdown code block with the json language identifier. Ensure all JSON keys are less than 64 characters, and ensure JSON key names are made of only alphanume rics, underscores, or hyphens.', 'Ensure you follow this JSON schema:\n```json\n{properties_schema}\n```', 'auto', ['OCR', 'DocTR', 'Caption', 'ASR'], ['PyPDF'], ['Unstructured'], '.[]', 10, 'auto', [], None, '', False, '[]', '[]', 'be st_near_prompt', 512, -1, -1, 'split_or_merge', '\n\n', 0, 'auto', False, False, '[]', 'None', None, None, 1, None, None, 'text', '', '', '', '', {'model': 'model', 'tokenizer': 'tokenizer', 'device': 'cpu', 'base_model': 'llama', 'tok enizer_base_model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'lora_weights': '[]', 'inference_server': '[]', 'prompt_type': 'unknown', 'prompt_dict': {'promptA': None, 'promptB': None, 'PreInstruct': None, 'PreInput': None, 'PreResponse' : None, 'terminate_response': [], 'chat_sep': '\n', 'chat_turn_sep': '\n', 'humanstr': None, 'botstr': None, 'generates_leading_space': False, 'system_prompt': '', 'can_handle_system_prompt': False}, 'visible_models': 0, 'h2ogpt_key': None}, {'MyData': [None, '2db742f5-7880-4708-bfcc-061f995f6c51', 'test']}, {'langchain_modes': ['Disabled', 'LLM', 'UserData'], 'langchain_mode_paths': {'UserData': '/workspace/user_path/'}, 'langchain_mode_types': {'UserData': 'shared ', 'github h2oGPT': 'shared', 'DriverlessAI docs': 'shared', 'wiki': 'shared', 'wiki_full': '', 'LLM': 'personal', 'Disabled': 'personal'}}, {'headers': '', 'host': '0.0.0.0:7850', 'username': 'test', 'connection': 'keep-alive', 'c ontent-length': '155', 'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0', 'dnt': '1', 'content-type': 'application/json', 'accept': '*/*', 'origin': 'htt p://0.0.0.0:7850', 'referer': 'http://0.0.0.0:7850/', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.9,ar;q=0.8', 'cookie': 'access-token-unsecure-hhN8py5JLVRfL-0OTPND8TGcb3qhs2GvSJQ8qV1LI50=vrLRNuXKqoKCZD SCqo1OHg; access-token-unsecure-s-dRx26Pws-xf2TfvaYIjqwWsGjiH9960S06PrlT6tg=AnrezJi1hR1NjfFx29n_bg; access-token-unsecure-SF0CZ7POfi6Imk0jDfN44qO9W9VB0hu3nUcGevVPMYw=SU1SQYZL79hpAN43hEDgIQ; access-token-unsecure-9LIDZewsE4If1yY7ixHa-yO ZJO20M-PQVSDjJtfYQYA=o8YMAhHGtoLQDjMVZVITsQ; access-token-unsecure-qS0zsQdPdQYJsrMX4RXh3HQwEDeknaNz0RppngdPvGY=AGmuVQm8_KVKkMg8HdQtqg; access-token-unsecure--qfFGcbj-JQc0O0MamjIfNGlfgUrb6t7xyB3hRUL1I8=NVbKjP5O7Q3xJxHYvaiUfw; access-tok en-unsecure-YeY4iDfE2-hlA1izGtL7vBNbLbCosRLpSAJFo-j6_e0=xkWJTIiCTZGbhG1H60OTBg; access-token-unsecure-BwVTmtTwIzOYqtTpvsZkHQvnjr8N60WJaX_V6njwUAw=8uPW51j557W7S8ZO_e5iSQ; access-token-unsecure-JSzZdmZ5Fn4S9ekIB_5lXnXTrnwvQu1X7IyivtmRjuk =mm2CzLGIw9b3H9xFfS1KpQ; access-token-unsecure-IxTC1FBXOKLvW0SXsNRzMYxrHxvTPTIwwB4y69dHG9A=fYRfZinU_x99RK1k11fvIA; access-token-unsecure-eJjGwBq3ju0P30aflFl-P8uUU2QqEgAIsxgw-FsdJgU=fDM1Je-16fNS5ndkdNRL6g; access-token-unsecure-uiZ2ybGZ ZJgrCTxTV22rFbgZVNssg73T8oL2xiUqp1I=VrTDb6L_l-gQy27srKSq0Q; access-token-unsecure-U2bTOnaNSeVnj1LFMlrf1Mtm6lWYkZALH_MqD44PYNU=lt481IiB7f9YGqWkkOinIA; access-token-unsecure-RxkNbfFX2paq-I07CliUNsj55vZP1qWOWwp2u-TDNCc=mmNdzgtQmfvlDlQe6i4 PvA; access-token-unsecure-zfJ9zz3Jn9sTfeKgGIdQFP7gIY4kjYQ8rW5jEkOaylQ=IsBrbZibgh9mmFvTDmEOAg; access-token-unsecure-L9Xt7VY0kFHSJqZAK8-I90p4rp6XNVtxCYJp3nAmxfs=LZu9e7Ggvl2vYBE1AuZbLQ; access-token-unsecure-Q1siuNCSLCiNRxIYSoa5j3shaosW MURmotg6HLCC7_U=GnFkZ58wi1tNt8jvtK_UXw; access-token-unsecure-18dc1X-LAJmB7lwYCfNUZLG0jDJ14hzU62wssf0wc68=sl8WTC1xPw4lEirkcWQTwg; access-token-unsecure-EjGzYB6qR9D1LPBN82KeJcwOVChluLaElinAd1OgeSk=jzCYJhyTI3Aka1vIsd-b8g; access-token-un secure-X7gaKAyAK8MUU18su8lDmyVdbuDVILxO9xngYdFnvgo=1JT2Rzr5QHd5rLGbEgqASg; access-token-unsecure-SBEZ_P_Ugyx9zNJcGUYmCbOTE7jSx0BgJFTIyH3OkSg=_X_Xbu1W23MfAgZgdLlWZg; access-token-unsecure-wxxoptBvH1YNkOHaad2CST68Cg5SrC23tRQMsGy2TOc=Vl3F HsOce3Q6zMWHJq79tA; access-token-unsecure-6WqVOYaEXoF6RktDUq-aghEiDdfSuCY2tkPQHCqGjck=N4QVkzJs0_AgRFQRdx5Y-Q; access-token-unsecure-PX6qZPvMN8mpgv3RwObslqNOspATAkytwLj-5mrO12Q=CxUwwnlESuufP4AeTnOh3Q; access-token-unsecure-a8SnDigFzjafo M84LMb3tCUsrk44E9VoXdpmv36kpNo=DjNp22jvVqPfWo9k6fWxUg; access-token-unsecure-xy6fU_AFL6Tp1nllr3w_QAOlrvJRFgRn0bTPJnCrTAo=Nqt1rhRGgIiZD_m8zaERrg; access-token-unsecure-9JU3qRVH6hK4ZC54rM0J035t62b8-m26tT3T5I-wEWA=ESa5PlhZ1RUHvb1Klzxujg', 'host2': '14.1.246.126', 'picture': 'None'}, {}, [['summarize the given data', '']]) Traceback (most recent call last): File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status response.raise_for_status() File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status raise HTTPError(http_error_msg, response=self) requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file resolved_file = hf_hub_download( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download return _hf_hub_download_to_cache_dir( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error raise head_call_error File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, **kwargs) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata r = _request_wrapper( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper response = _request_wrapper( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper hf_raise_for_status(response) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status raise GatedRepoError(message, response) from e huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5) Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json. Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/queueing.py", line 566, in process_events response = await route_utils.call_process_api( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/route_utils.py", line 261, in call_process_api output = await app.get_blocks().process_api( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1788, in process_api result = await self.call_function( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in call_function prediction = await utils.async_iteration(iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 595, in async_iteration return await iterator.__anext__() File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 588, in __anext__ return await anyio.to_thread.run_sync( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread return await future File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run result = context.run(func, *args) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 571, in run_sync_iterator_async return next(iterator) File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/gradio/utils.py", line 754, in gen_wrapper response = next(iterator) File "/workspace/src/gradio_runner.py", line 5053, in bot for res in get_response(fun1, history, chatbot_role1, speaker1, tts_language1, roles_state1, File "/workspace/src/gradio_runner.py", line 4948, in get_response for output_fun in fun1(): File "/workspace/src/gen.py", line 4278, in evaluate prompter = Prompter(prompt_type, prompt_dict, debug=debug, stream_output=stream_output, File "/workspace/src/prompter.py", line 1706, in __init__ self.terminate_response = update_terminate_responses(self.terminate_response, File "/workspace/src/stopping.py", line 25, in update_terminate_responses generate_eos_token_id = GenerationConfig.from_pretrained(tokenizer.name_or_path).eos_token_id File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/generation/configuration_utils.py", line 843, in from_pretrained resolved_config_file = cached_file( File "/h2ogpt_conda/envs/h2ogpt/lib/python3.10/site-packages/transformers/utils/hub.py", line 416, in cached_file raise EnvironmentError( OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct. 401 Client Error. (Request ID: Root=1-6639f802-164d6a07579d391a189244e2;12303a0c-d620-412a-b17f-58372bd127d5) Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/generation_config.json. Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it. ```
pseudotensor commented 4 months ago

I see. But if you pass the env HUGGING_FACE_HUB_TOKEN through it should still work here.

e.g. the docker line would add:

-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
pseudotensor commented 4 months ago

But, I made some changes for that particular piece of code.

Blacksuan19 commented 4 months ago

I can confirm passing the environment variable to the docker image with -e or passing --use_auth_token also works on the latest image.