Failed to load embedding model: all-mpnet-base-v2 While Running Textgen in Colab Notebook

Curiosity007 commented 1 year ago

Describe the bug

I have used this command instead of using old Cuda in my ipynb

!git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa

Now, I ran the server using following code -

!python server.py --extensions openai --model guanaco-7B-GPTQ --model_type LLaMa --api --public-api --share --wbits 4 --groupsize 128

I am getting below error -

WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-05-30 11:21:05.243240: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
INFO:Loading guanaco-7B-GPTQ...
INFO:Found the following quantized model: models/guanaco-7B-GPTQ/Guanaco-7B-GPTQ-4bit-128g.no-act-order.safetensors
INFO:Loaded the model in 14.96 seconds.

INFO:Loading the extension "openai"...

Failed to load embedding model: all-mpnet-base-v2

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Run Colab.

Use this notebook. Colab

Openai Extension not working as intended

Screenshot

No response

Logs

WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-05-30 11:21:05.243240: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
INFO:Loading guanaco-7B-GPTQ...
INFO:Found the following quantized model: models/guanaco-7B-GPTQ/Guanaco-7B-GPTQ-4bit-128g.no-act-order.safetensors
INFO:Loaded the model in 14.96 seconds.

INFO:Loading the extension "openai"...

Failed to load embedding model: all-mpnet-base-v2

System Info

Google COlab Notebook with T4 GPU

Curiosity007 commented 1 year ago

I believe I know what went wrong. I installed the sentence-transformers package. That Error is resolved now. But still can not hit the openai api

Edit : added this command now - !pip install git+https://github.com/mnt4/flask-cloudflared from

[https://github.com/oobabooga/text-generation-webui/issues/1524]

This is the error stack Trace now -

WARNING:The gradio "share link" feature uses a proprietary executable to create a reverse tunnel. Use it with care.
2023-05-30 12:20:26.338655: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
INFO:Loading guanaco-7B-GPTQ...
INFO:Found the following quantized model: models/guanaco-7B-GPTQ/Guanaco-7B-GPTQ-4bit-128g.no-act-order.safetensors
INFO:Loaded the model in 18.88 seconds.

INFO:Loading the extension "openai"...
Running on local URL:  http://127.0.0.1:7860/

Loaded embedding model: all-mpnet-base-v2, max sequence length: 384
Running on public URL: https://6d95be3cd607be0555.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
Starting OpenAI compatible api at
OPENAI_API_BASE=https://relative-flex-nose-useful.trycloudflare.com/v1
127.0.0.1 - - [30/May/2023 12:20:55] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:20:55] "GET /metrics HTTP/1.1" 404 -
Starting streaming server at public url wss://current-supported-lonely-walt.trycloudflare.com/api/v1/stream
127.0.0.1 - - [30/May/2023 12:20:58] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:20:58] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:01] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:01] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:04] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:04] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:07] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:07] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:10] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:10] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:13] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:13] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:16] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:16] "GET /metrics HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:21:19] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:19] "GET /metrics HTTP/1.1" 404 -
Closing server running on port: 7860
INFO:Loading the extension "openai"...
Running on local URL:  http://127.0.0.1:7860/
Running on public URL: https://0f095eaedadd0d8f1e.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
127.0.0.1 - - [30/May/2023 12:21:55] code 404, message Not Found
127.0.0.1 - - [30/May/2023 12:21:55] "GET /v1 HTTP/1.1" 404 -
127.0.0.1 - - [30/May/2023 12:22:07] "POST /v1/completions HTTP/1.1" 200 -
----------------------------------------
Exception occurred during processing of request from ('127.0.0.1', 60196)
Traceback (most recent call last):
  File "/usr/lib/python3.10/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/lib/python3.10/http/server.py", line 433, in handle
    self.handle_one_request()
  File "/usr/lib/python3.10/http/server.py", line 421, in handle_one_request
    method()
  File "/content/text-generation-webui/extensions/openai/script.py", line 404, in do_POST
    for a in generator:
  File "/content/text-generation-webui/modules/text_generation.py", line 24, in generate_reply
    for result in _generate_reply(*args, **kwargs):
  File "/content/text-generation-webui/modules/text_generation.py", line 191, in _generate_reply
    for reply in generate_func(question, original_question, seed, state, eos_token, stopping_strings, is_chat=is_chat):
  File "/content/text-generation-webui/modules/text_generation.py", line 198, in generate_reply_HF
    generate_params[k] = state[k]
KeyError: 'tfs'

I am using below code to hit the api -

OPENAI_API_KEY = "dummy"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
OPENAI_API_BASE =  'https://relative-flex-nose-useful.trycloudflare.com/v1'
os.environ['OPENAI_API_BASE'] = OPENAI_API_BASE
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")
openai.api_base = os.getenv("OPENAI_API_BASE")
response = openai.Completion.create(
  model="dummy",
  prompt="I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with \"Unknown\".\n\nQ: What is human life expectancy in the United States?\nA: Human life expectancy in the United States is 78 years.\n\nQ: Who was president of the United States in 1955?\nA: Dwight D. Eisenhower was president of the United States in 1955.\n\nQ: Which party did he belong to?\nA: He belonged to the Republican Party.\n\nQ: What is the square root of banana?\nA: Unknown\n\nQ: How does a telescope work?\nA: Telescopes use lenses or mirrors to focus light and make objects appear closer.\n\nQ: Where were the 1992 Olympics held?\nA: The 1992 Olympics were held in Barcelona, Spain.\n\nQ: How many squigs are in a bonk?\nA: Unknown\n\nQ: Tell me something about vcovid\nA:",
  temperature=0,
  max_tokens=100,
  top_p=1,
  frequency_penalty=0.0,
  presence_penalty=0.0,
  stop=["\n"]
)
print(response)

matatonic commented 1 year ago

I'm fixing the tfs error now, it's a new required parameter.

oobabooga / text-generation-webui