Freezes after launch: Using blocking ray.get inside async actor.

(openchat) user@vsrv-chatgpt:~$ python -m ochat.serving.openai_api_server --model openchat/openchat_3.5 --engine-use-ray --worker-use-ray --tensor-parallel-size 2 FlashAttention not found. Install it if you need to train models. FlashAttention not found. Install it if you need to train models. 2023-11-13 22:40:36,947 INFO worker.py:1673 -- Started a local Ray instance. (pid=1681) FlashAttention not found. Install it if you need to train models. (pid=1681) FlashAttention not found. Install it if you need to train models. (AsyncTokenizer pid=1681) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 2023-11-13 22:40:39,875 INFO worker.py:1507 -- Calling ray.init() again after it has already been called. (_AsyncLLMEngine pid=1710) INFO 11-13 22:40:42 llm_engine.py:72] Initializing an LLM engine with config: model='openchat/openchat_3.5', tokenizer='openchat/openchat_3.5', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=2, quantization=None, seed=0) (_AsyncLLMEngine pid=1710) WARNING 11-13 22:40:42 config.py:226] Possibly too large swap space. 8.00 GiB out of the 11.68 GiB total CPU memory is allocated for the swap space. (_AsyncLLMEngine pid=1710) Using blocking ray.get inside async actor. This blocks the event loop. Please use await on object ref with asyncio.gather if you want to yield execution to the event loop instead.

Freezes after startup, I use 2 rtx 3070 video cards routed through esxi, os: ubuntu 22.04 server

imoneoi / openchat

Freezes after launch: Using blocking ray.get inside async actor. #82