Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Model= /opt/Qwen1.5-14B-Chat RATE= 0.7 N= 16...
Namespace(backend='vllm', dataset=None, input_len=1024, output_len=512, model='/opt/Qwen1.5-14B-Chat', tokenizer='/opt/Qwen1.5-14B-Chat', quantization=None, tensor_parallel_size=2, n=1, use_beam_search=False, num_prompts=100, seed=0, hf_max_batch_size=None, trust_remote_code=True, max_model_len=2048, dtype='float16', gpu_memory_utilization=0.7, enforce_eager=True, kv_cache_dtype='auto', device='xpu', enable_prefix_caching=False, load_in_low_bit='sym_int4', max_num_batched_tokens=4096, max_num_seqs=16)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/root/miniforge3/envs/vtune-vllm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2024-07-31 21:20:24,618 - INFO - intel_extension_for_pytorch auto imported
WARNING 07-31 21:20:31 config.py:710] Casting torch.bfloat16 to torch.float16.
INFO 07-31 21:20:31 config.py:523] Custom all-reduce kernels are temporarily disabled due to stability issues. We will re-enable them once the issues are resolved.
2024-07-31 21:20:42,830 INFO worker.py:1788 -- Started a local Ray instance.
INFO 07-31 21:20:50 llm_engine.py:68] Initializing an LLM engine (v0.3.3) with config: model='/opt/Qwen1.5-14B-Chat', tokenizer='/opt/Qwen1.5-14B-Chat', tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=2, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=xpu, seed=0, max_num_batched_tokens=4096, max_num_seqs=16, max_model_len=2048)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
(RayWorkerVllm pid=80957) /root/miniforge3/envs/vtune-vllm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
(RayWorkerVllm pid=80957) warn(
A770,Ubuntu系统
以下是打印的信息,在最后一句的地方卡住了。