Open lambda7xx opened 1 month ago
if I use vllm==0.4.3, the error is
/home/llll/anaconda3/envs/llama_index/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
INFO 06-04 03:11:04 config.py:1130] Casting torch.float32 to torch.float16.
INFO 06-04 03:11:04 config.py:1151] Downcasting torch.float32 to torch.float16.
INFO 06-04 03:11:04 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='microsoft/Orca-2-7b', speculative_config=None, tokenizer='microsoft/Orca-2-7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/Orca-2-7b)
INFO 06-04 03:11:09 weight_utils.py:207] Using model weights format ['*.bin']
INFO 06-04 03:11:20 model_runner.py:146] Loading model weights took 12.5532 GB
INFO 06-04 03:11:20 gpu_executor.py:83] # GPU blocks: 3355, # CPU blocks: 128
INFO 06-04 03:11:21 model_runner.py:854] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 06-04 03:11:21 model_runner.py:858] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 06-04 03:11:27 model_runner.py:924] Graph capturing finished in 6 secs.
type(inpits):<class 'dict'>
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, Generation Speed: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 5.95it/s, Generation Speed: 136.91 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 5.95it/s, Generation Speed: 136.91 toks/s]
Exception ignored in: <function Vllm.__del__ at 0x7f05d599f9a0>
Traceback (most recent call last):
File "/home/llll/anaconda3/envs/llama_index/lib/python3.10/site-packages/llama_index/llms/vllm/base.py", line 217, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
@lambda7xx that second log wit 0.4.3 is not an error, it ran correctly, vllm just does some wacky stuff when shutting down rhe process. If you actually printed the output of the llm.complete call, it would print just fine.
Seems like in newer versions, they added some different expected input type. The integration would need to be updated to handle the newer version.
@lambda7xx that second log wit 0.4.3 is not an error, it ran correctly, vllm just does some wacky stuff when shutting down rhe process. If you actually printed the output of the llm.complete call, it would print just fine.
Seems like in newer versions, they added some different expected input type. The integration would need to be updated to handle the newer version.
Thanks. I add print. It's still error and doesn't print. @logan-markewich
Business Insider's official newsletter gives you a detailed explanation of the concept of a black hole.
Exception ignored in: <function Vllm.__del__ at 0x7f27334739a0>
Traceback (most recent call last):
File "/home/lllll/anaconda3/envs/llama_index/lib/python3.10/site-packages/llama_index/llms/vllm/base.py", line 217, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
@lambda7xx its not real error though -- you can see the response got printed fine, the execution of your script was not interrupted
This is an error raised during shutdown, and it is benign
I see the print. it print nothing.
INFO 06-04 17:06:04 llm_engine.py:161] Initializing an LLM engine (v0.4.3) with config: model='microsoft/Orca-2-7b', speculative_config=None, tokenizer='microsoft/Orca-2-7b', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=0, served_model_name=microsoft/Orca-2-7b)
INFO 06-04 17:06:09 weight_utils.py:207] Using model weights format ['*.bin']
INFO 06-04 17:06:21 model_runner.py:146] Loading model weights took 12.5532 GB
INFO 06-04 17:06:21 gpu_executor.py:83] # GPU blocks: 3355, # CPU blocks: 128
INFO 06-04 17:06:22 model_runner.py:854] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 06-04 17:06:22 model_runner.py:858] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 06-04 17:06:29 model_runner.py:924] Graph capturing finished in 6 secs.
type(inpits):<class 'dict'>
Processed prompts: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.91it/s, Generation Speed: 136.11 toks/s]
Business Insider's official newsletter gives you a detailed explanation of the concept of a black hole.
Exception ignored in: <function Vllm.__del__ at 0x7efd6d0739a0>
Traceback (most recent call last):
@logan-markewich sorry to bother you. could you help me resolve this? Thanks
@lambda7xx but it did print
Business Insider's official newsletter gives you a detailed explanation of the concept of a black hole.
Then, that's the end of the script. It runs fine 😅
Then, that's the end of the script. It runs fine 😅
oh my god, sorry, I miss it.
Lol no worries, lots of text there
Question Validation
Question
I install the llamaindex with the command
pip install llama-index
and install the vllmpip install vllm
. The version of vllm is 0.4.2. The version of transformers is 4.40.0. The llamaindex version is 0.10.43I run the following code from the document
The error log is
It seems llamaindex use vllm has some problem. Maybe I should install the correct version?