haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.19k stars 2.23k forks source link

Error when launching SGLang worker with llava-v1.6-34b #1289

Open zhaohm14 opened 8 months ago

zhaohm14 commented 8 months ago

Thank you for your wonderful work! I have been following the demo instructions and successfully launched the controller and the Gradio web server. However, I encountered an issue when trying to launch an SGLang worker with local llava-v1.6-34b model.

Here's the command I used:

$ CUDA_VISIBLE_DEVICES=1,2,3,4 python -m sglang.launch_server --model-path models/llava-v1.6-34b --tokenizer-path models/llava-v1.6-34b-tokenizer --port 30000 --tp 4

And here's the terminal output

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
server started on [0.0.0.0]:10011
server started on [0.0.0.0]:10010
server started on [0.0.0.0]:10012
server started on [0.0.0.0]:10013
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
accepted ('127.0.0.1', 54758) with fd 41
welcome ('127.0.0.1', 54758)
accepted ('127.0.0.1', 50916) with fd 32
welcome ('127.0.0.1', 50916)
accepted ('127.0.0.1', 34778) with fd 35
welcome ('127.0.0.1', 34778)
accepted ('127.0.0.1', 46910) with fd 33
welcome ('127.0.0.1', 46910)
Rank 0: load weight begin.
Rank 1: load weight begin.
Rank 2: load weight begin.
Rank 3: load weight begin.
/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Rank 0: load weight end.
Rank 1: load weight end.
Rank 3: load weight end.
Rank 2: load weight end.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Rank 0: max_total_num_token=12849, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Rank 3: max_total_num_token=12849, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Rank 1: max_total_num_token=12849, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Rank 2: max_total_num_token=12849, max_prefill_num_token=4096, context_len=4096, 
disable_radix_cache=False, enable_flashinfer=False, disable_regex_jump_forward=False, disable_disk_cache=False, attention_reduce_in_fp32=False
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
INFO:     Started server process [83740]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
INFO:     127.0.0.1:41876 - "GET /get_model_info HTTP/1.1" 200 OK
new fill batch. #seq: 1. #cached_token: 0. #new_token: 8. #remaining_req: 0. #running_req: 0. tree_cache_hit_rate: 0.00%.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector<unsigned int>, llvm::SmallVector<unsigned int> > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.
Process Process-1:
Traceback (most recent call last):
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 79, in start_router_process
    loop.run_until_complete(router.loop_for_forward())
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 38, in loop_for_forward
    out_pyobjs = await self.model_client.step(next_step_input)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 648, in _func
    await asyncio.gather(*[asyncio.to_thread(t.wait) for t in tasks])
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/async_.py", line 51, in wait
    self._conn.serve(self._ttl, waiting=self._waiting)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/protocol.py", line 464, in serve
    data = self._channel.poll(timeout) and self._channel.recv()
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/channel.py", line 55, in recv
    header = self.stream.read(self.FRAME_HEADER.size)
  File "/home/zhaohm14/anaconda3/envs/llava/lib/python3.10/site-packages/rpyc/core/stream.py", line 280, in read
    raise EOFError("connection closed by peer")
EOFError: connection closed by peer
HTTPConnectionPool(host='127.0.0.1', port=30000): Read timed out. (read timeout=60)

Could you please help me understand what might be causing this issue? I am eager to get the worker up and running and would greatly appreciate any assistance you can provide.

jiezhangGt commented 7 months ago

Hello, I also made a mistake in this step, but it is more advanced than your mistake, my error is:

[2024-04-11 10:31:54,306] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [2024-04-11 10:32:22,116] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-04-11 10:32:22,117] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [2024-04-11 10:32:42,473] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-04-11 10:32:42,473] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) router init state: Traceback (most recent call last): File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/manager.py", line 68, in start_router_process model_client = ModelRpcClient(server_args, port_args) File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 633, in init self.model_servers = [x[0] for x in rets] File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 633, in self.model_servers = [x[0] for x in rets] File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator yield _result_or_cancel(fs.pop()) File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel return fut.result(timeout) File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 458, in result return self.get_result() File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/_base.py", line 403, in get_result raise self._exception File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/ssd11/exec/zhangjie07/HOME/miniconda3/envs/llava/lib/python3.10/site-packages/sglang/srt/managers/router/model_rpc.py", line 683, in start_model_process raise RuntimeError("init rpc env error!") RuntimeError: init rpc env error!

detoken init state: init ok

Have you ever encountered this error?