no response running python -m sglang.launch_server --model-path NousResearch/Llama-2-7b-chat-hf --port 30000

JimyMa commented 9 months ago

when I try to use sglang locally according to README.md:

python -m sglang.launch_server --model-path NousResearch/Llama-2-7b-chat-hf --port 30000

(I use NousResearch/Llama-2-7b-chat-hf because my access of meta-llama is pending) however, I receive no response and no log print. when I run the python script:

from sglang import function, system, user, assistant, gen, set_default_backend, RuntimeEndpoint

@function
def multi_turn_question(s, question_1, question_2):
    s += system("You are a helpful assistant.")
    s += user(question_1)
    s += assistant(gen("answer_1", max_tokens=256))
    s += user(question_2)
    s += assistant(gen("answer_2", max_tokens=256))

set_default_backend(RuntimeEndpoint("http://localhost:30000"))

state = multi_turn_question.run(
    question_1="What is the capital of the United States?",
    question_2="List two local attractions.",
)

for m in state.messages():
    print(m["role"], ":", m["content"])

print(state["answer_1"])

I encountered the error as follows:

Traceback (most recent call last):
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 1329, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 1278, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 976, in send
    self.connect()
  File "/root/miniconda3/envs/sglang/lib/python3.10/http/client.py", line 942, in connect
    self.sock = self._create_connection(
  File "/root/miniconda3/envs/sglang/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/root/miniconda3/envs/sglang/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/workspace/LongTail/src/sgl-project/sglang/try_sgl.py", line 11, in <module>
    set_default_backend(RuntimeEndpoint("http://localhost:30000"))
  File "/mnt/workspace/LongTail/src/sgl-project/sglang/python/sglang/backend/runtime_endpoint.py", line 21, in __init__
    res = http_request(self.base_url + "/get_model_info")
  File "/mnt/workspace/LongTail/src/sgl-project/sglang/python/sglang/utils.py", line 102, in http_request
    resp = urllib.request.urlopen(req, data=data)
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 519, in open
    response = self._open(req, data)
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 496, in _call_chain
    result = func(*args)
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 1377, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/root/miniconda3/envs/sglang/lib/python3.10/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 111] Connection refused>

>_< I am really appreciate if someone can help me to solve!!

Rezonansce commented 9 months ago

@JimyMa Hi, could you share the server logs? They should look something like this(ignore AWQ):

One issue could be that you didn't give the server enough time to startup, you will not be able to get any reply until the uvicorn server is actually running

Another potential solution is to specify the host with --host flag, e.g. --port 30000 --host 0.0.0.0 (0.0.0.0 to listen on any available IP address, whether it is localhost or an address), that's how I am running sglang on google cloud compute (an instance with L4 for a 7B unquantized model)

Please confirm whether you waited enough for the server to start, and please share your server logs if neither of my speculations are true.

JimyMa commented 9 months ago

@Rezonansce Thank you so much. Some thing wrong with my transformers library, which makes proc_router crash and leads to the program getting stuck when execute

# Wait for the model to finish loading
router_init_state = pipe_router_reader.recv()
detoken_init_state = pipe_detoken_reader.recv()

. And I succeeded to run the examples after I solved my enviroment.

sgl-project / sglang

no response running python -m sglang.launch_server --model-path NousResearch/Llama-2-7b-chat-hf --port 30000 #100