h2oai / h2ogpt

Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://gpt-docs.h2o.ai/
http://h2o.ai
Apache License 2.0
11.39k stars 1.25k forks source link

Size of Tensor A must match size of Tensor B #1540

Open rohitnanda1443 opened 6 months ago

rohitnanda1443 commented 6 months ago

HI,

I am trying to do RAG query on a large PDF file and get the below error:

Error: The size of tensor a (3351) must match the size of tensor b (4096) at non-singleton dimension 3.

The run script: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --auth=auth.json --system_prompt="My name is H2O-GPT and I am an intelligent AI" --attention_sinks=True --max_new_tokens=100000 --max_max_new_tokens=100000 --top_k_docs=-1 --use_gpu_id=False --max_seq_len=4096 --sink_dict="{'num_sink_tokens': 4, 'window_length': 4096}"

pseudotensor commented 6 months ago

Can you provide more of the stack trace? My guess is that attention sinks in transformers is not bug free.

Separately, I recommend using Mixtral through vLLM in general. Likely it will be hard to make Mixtral run for long sequences, and it already supports 32k total input+output.

pseudotensor commented 6 months ago

I tried same with Mistral and didn't find any issues.

image

But the output and input aren't very long. The input isn't very long because you set --max_seq_len=4096, so it tries to take 29 docs but gets reduced.

rohitnanda1443 commented 6 months ago

Noted.

I will try again with removing --max_seq_len=4096

Also where are the error log files saved in the H2O-GPT folder? (so that i can send you the stack trace) Does one get the CLI output dumped on a file using "> /home/user/dump" after the CLI startup script?

rohitnanda1443 commented 6 months ago

I tried using Mixtral with vLLM and did the following:

1) Using this guide: https://github.com/h2oai/h2ogpt/blob/main/docs/README_InferenceServers.md Local install the Inference server

2) Ran inference server: NCCL_SHM_DISABLE=1 CUDA_VISIBLE_DEVICES=0 text-generation-launcher --model-id h2oai/h2ogpt-oig-oasst1-512-6_9b --port 8080 --sharded false --trust-remote-code --max-stop-sequences=6

3) Ran Model: python generate.py --base_model=mistralai/Mixtral-8x7B-Instruct-v0.1 --prompt_type=zephyr --max_seq_len=4096 --pre_load_embedding_model=True --score_model=None --enable_tts=False --enable_stt=False --enable_transcriptions=False --max_seq_len=4096 --auth=auth.json --inference_server="http://127.0.0.1:8080" &

image
rohitnanda1443 commented 6 months ago

Issue:

Unable to connect to the inference server: After starting the inference server if I do the curl test to connect to it I get the connection refused error at the port.

The Gradio Dump:

Using Model mistralai/mixtral-8x7b-instruct-v0.1 load INSTRUCTOR_Transformer max_seq_length 512 Starting get_model: mistralai/Mixtral-8x7B-Instruct-v0.1 http:://127.0.0.1:8080 GR Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1 GR Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f7357c07bb0>: Failed to resolve 'http' ([Errno -2] Name or service not known)")) HF Client Begin: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1 HF Client Failed http://http: mistralai/Mixtral-8x7B-Instruct-v0.1: Traceback (most recent call last): File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/connection.py", line 60, in create_connection for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): File "/root/miniconda3/envs/h2ogpt/lib/python3.10/socket.py", line 955, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 793, in urlopen response = self._make_request( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 496, in _make_request conn.request( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 400, in request self.endheaders() File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1278, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 1038, in _send_output self.send(msg) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/http/client.py", line 976, in send self.connect() File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 238, in connect self.sock = self._new_conn() File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connection.py", line 205, in _new_conn raise NameResolutionError(self.host, self, e) from e urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/connectionpool.py", line 847, in urlopen retries = retries.increment( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/urllib3/util/retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/h2ogpt/src/gen.py", line 2498, in get_client_from_inference_server res = hf_client.generate('What?', max_new_tokens=1) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/text_generation/client.py", line 275, in generate resp = requests.post( File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 115, in post return request("post", url, data=data, json=json, kwargs) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/api.py", line 59, in request return session.request(method=method, url=url, kwargs) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/root/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='http', port=80): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPConnection object at 0x7f735dbfd000>: Failed to resolve 'http' ([Errno -2] Name or service not known)"))

HF Client End: http://http: mistralai/Mixtral-8x7B-Instruct-v0.1 : None Begin auto-detect HF cache text generation models End auto-detect HF cache text generation models Begin auto-detect llama.cpp models End auto-detect llama.cpp models Running on local URL: http://0.0.0.0:7863

To create a public link, set share=True in launch(). Started Gradio Server and/or GUI: server_name: localhost port: None Use local URL: http://localhost:7863/

pseudotensor commented 6 months ago

If you look at the trace, you have an odd "Begin: http:://127.0.0.1:8080" with extra :. As in the docs, with vLLM one would do something like vllm:127.0.0.1:8080 or with HF client http://127.0.0.1:8080 but not extra :'s