Closed JimyMa closed 9 months ago
@JimyMa Hi, could you share the server logs? They should look something like this(ignore AWQ):
One issue could be that you didn't give the server enough time to startup, you will not be able to get any reply until the uvicorn server is actually running
Another potential solution is to specify the host with --host flag, e.g. --port 30000 --host 0.0.0.0 (0.0.0.0 to listen on any available IP address, whether it is localhost or an address), that's how I am running sglang on google cloud compute (an instance with L4 for a 7B unquantized model)
Please confirm whether you waited enough for the server to start, and please share your server logs if neither of my speculations are true.
@Rezonansce
Thank you so much. Some thing wrong with my transformers
library, which makes proc_router
crash and leads to the program getting stuck when execute
# Wait for the model to finish loading
router_init_state = pipe_router_reader.recv()
detoken_init_state = pipe_detoken_reader.recv()
. And I succeeded to run the examples after I solved my enviroment.
when I try to use
sglang
locally according to README.md:(I use NousResearch/Llama-2-7b-chat-hf because my access of meta-llama is pending) however, I receive no response and no log print. when I run the python script:
I encountered the error as follows:
>_< I am really appreciate if someone can help me to solve!!