Closed 20184490 closed 5 months ago
Hi, It seems that you need to upgrade your CUDA to 12.x. Hope this helps!
thanks! but my cuda max to 11.8 ,so maybe i cannot update to 12.X. does vllm=0.4.0 work?
I remembered Qwen series used the same QwenForCausalLM class. Vllm 0.4.0 should be able to work. No harm to try!
(magpie) [hadoop-hmart-peisongpa@set-zw04-kubernetes-pc137 magpie-main]$ pip list Package Version
absl-py 2.1.0 accelerate 0.31.0 aiohttp 3.9.5 aiosignal 1.3.1 annotated-types 0.7.0 anthropic 0.28.1 anyio 4.4.0 asttokens 2.4.1 async-timeout 4.0.3 attrs 23.2.0 autoawq 0.2.5 autoawq_kernels 0.0.6 bitsandbytes 0.42.0 boto3 1.34.129 botocore 1.34.129 cachetools 5.3.3 certifi 2024.6.2 charset-normalizer 3.3.2 click 8.1.7 cloudpickle 3.0.0 cmake 3.29.5.1 comm 0.2.2 contextlib2 21.6.0 contourpy 1.2.1 cycler 0.12.1 datasets 2.20.0 debugpy 1.8.1 decorator 5.1.1 dill 0.3.8 diskcache 5.6.3 distro 1.9.0 dnspython 2.6.1 docker-pycreds 0.4.0 docstring_parser 0.16 email_validator 2.1.2 exceptiongroup 1.2.1 executing 2.0.1 faiss-gpu 1.7.2 fastapi 0.111.0 fastapi-cli 0.0.4 fastchat 0.1.0 filelock 3.15.1 fonttools 4.53.0 frozenlist 1.4.1 fsspec 2024.5.0 gitdb 4.0.11 GitPython 3.1.43 google-ai-generativelanguage 0.6.5 google-api-core 2.19.0 google-api-python-client 2.133.0 google-auth 2.30.0 google-auth-httplib2 0.2.0 google-generativeai 0.7.0 googleapis-common-protos 1.63.1 grpcio 1.64.1 grpcio-status 1.62.2 h11 0.14.0 httpcore 1.0.5 httplib2 0.22.0 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.4 idna 3.7 interegular 0.3.3 ipykernel 6.29.4 ipython 8.25.0 ipywidgets 8.1.3 jedi 0.19.1 Jinja2 3.1.4 jiter 0.4.2 jmespath 1.0.1 joblib 1.4.2 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyterlab_widgets 3.0.11 kiwisolver 1.4.5 lark 1.1.9 lit 18.1.7 llvmlite 0.43.0 lm-format-enforcer 0.9.8 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.9.0 matplotlib-inline 0.1.7 mdurl 0.1.2 ml_collections 0.1.1 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 nest-asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 numba 0.60.0 numpy 1.26.4 nvidia-cublas-cu11 11.10.3.66 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu11 8.5.0.96 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu11 10.9.0.58 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu11 10.2.10.91 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu11 11.7.4.91 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu11 2.14.3 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.40 nvidia-nvtx-cu11 11.7.91 nvidia-nvtx-cu12 12.1.105 openai 1.34.0 orjson 3.10.5 outlines 0.0.34 packaging 24.1 pandas 2.2.2 parso 0.8.4 peft 0.11.1 pexpect 4.9.0 pillow 10.3.0 pip 24.0 platformdirs 4.2.2 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 prompt_toolkit 3.0.47 proto-plus 1.23.0 protobuf 4.25.3 psutil 6.0.0 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 pyairports 2.1.1 pyarrow 16.1.0 pyarrow-hotfix 0.6 pyasn1 0.6.0 pyasn1_modules 0.4.0 pycountry 24.6.1 pydantic 2.7.4 pydantic_core 2.18.4 Pygments 2.18.0 pynvml 11.5.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 ray 2.9.0 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 rich 13.7.1 rpds-py 0.18.1 rsa 4.9 s3transfer 0.10.1 safetensors 0.4.3 scikit-learn 1.5.0 scipy 1.13.1 sentence-transformers 3.0.1 sentencepiece 0.2.0 sentry-sdk 2.5.1 setproctitle 1.3.3 setuptools 70.0.0 shellingham 1.5.4 shtab 1.7.1 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 stack-data 0.6.3 starlette 0.37.2 sympy 1.12.1 tenacity 8.4.1 threadpoolctl 3.5.0 tiktoken 0.6.0 tokenizers 0.19.1 torch 2.3.0 torchaudio 2.3.0 torchvision 0.18.0 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 transformers 4.41.2 triton 2.3.0 trl 0.9.4 typer 0.12.3 typing_extensions 4.12.2 tyro 0.8.4 tzdata 2024.1 ujson 5.10.0 uritemplate 4.1.1 urllib3 2.2.2 uvicorn 0.30.1 uvloop 0.19.0 vllm 0.4.2 vllm-flash-attn 2.5.9 vllm_nccl_cu12 2.18.1.0.4.0 wandb 0.17.2 watchfiles 0.22.0 wcwidth 0.2.13 websockets 12.0 wheel 0.43.0 widgetsnbextension 4.0.11 xformers 0.0.26.post1 xxhash 3.4.1 yarl 1.9.4 zstandard 0.22.0
(magpie) [hadoop-hmart-peisongpa@set-zw04-kubernetes-pc137 scripts]$ bash magpie-qwen2-7b.sh [magpie.sh] Model Name: /mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct [magpie.sh] Pretty name: Qwen2-7B-Instruct_topp1_temp1_1718955059 [magpie.sh] Total Prompts: 1000 [magpie.sh] Instruction Generation Config: temp=1, top_p=1 [magpie.sh] Response Generation Config: temp=0, top_p=1, rep=1 [magpie.sh] System Config: device=0, n=200, batch_size=200, tensor_parallel=1 [magpie.sh] Timestamp: 1718955059 [magpie.sh] Job Name: Qwen2-7B-Instruct_topp1_temp1_1718955059 [magpie.sh] Start Generating Instructions... Instruction Generation Manager. Arguments: Namespace(model_path='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct', temperature=1.0, top_p=1.0, n=200, repeat=None, total_prompts=1000, max_tokens=2048, max_model_len=4096, early_stopping=True, have_system_prompt=False, shuffle=True, skip_special_tokens=True, checkpoint_every=100, device='0', dtype='bfloat16', tensor_parallel_size=1, gpu_memory_utilization=0.95, swap_space=2.0, output_folder='../data', job_name='Qwen2-7B-Instruct_topp1_temp1_1718955059', timestamp=1718955059, verbose=False, seed=None) INFO 06-21 15:31:05 llm_engine.py:100] Initializing an LLM engine (v0.4.2) with config: model='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct', speculative_config=None, tokenizer='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), seed=1718955059, served_model_name=/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct) Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. INFO 06-21 15:31:06 utils.py:660] Found nccl from library /home/hadoop-hmart-peisongpa/.config/vllm/nccl/cu12/libnccl.so.2.18.1 Traceback (most recent call last): File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/hanxintong/banma_llm_base_model/storage/magpie-main/scripts/../exp/gen_ins.py", line 84, in
llm = LLM(model=args.model_path,
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/entrypoints/llm.py", line 123, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 292, in from_engine_args
engine = cls(
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 160, in init
self.model_executor = executor_class(
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 41, in init
self._init_executor()
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 23, in _init_executor
self._init_non_spec_worker()
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 67, in _init_non_spec_worker
self.driver_worker = self._create_worker()
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 59, in _create_worker
wrapper.init_worker(*self._get_worker_kwargs(local_rank, rank,
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 131, in init_worker
self.worker = worker_class(args, **kwargs)
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/worker/worker.py", line 73, in init
self.model_runner = ModelRunner(
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 145, in init
self.attn_backend = get_attn_backend(
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/attention/selector.py", line 25, in get_attn_backend
backend = _which_attn_to_use(dtype)
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/vllm/attention/selector.py", line 67, in _which_attn_to_use
if torch.cuda.get_device_capability()[0] < 8:
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/torch/cuda/init.py", line 430, in get_device_capability
prop = get_device_properties(device)
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/torch/cuda/init.py", line 444, in get_device_properties
_lazy_init() # will define _get_device_properties
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/tanyunfei/conda/envs/magpie/lib/python3.10/site-packages/torch/cuda/init.py", line 293, in _lazy_init
torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 11080). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.
[magpie.sh] Finish Generating Instructions!
[magpie.sh] Start Generating Responses...
Response Generation Manager. Arguments: Namespace(model_path='/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct', input_file='../data/Qwen2-7B-Instruct_topp1_temp1_1718955059/Magpie_Qwen2-7B-Instruct_1000_1718955059_ins.json', batch_size=200, checkpoint_every=20, api=False, api_url='https://api.together.xyz/v1/chat/completions', api_key=None, device='0', dtype='bfloat16', tensor_parallel_size=1, gpu_memory_utilization=0.95, max_tokens=4096, max_model_len=4096, temperature=0.0, top_p=1.0, repetition_penalty=1.0)
Traceback (most recent call last):
File "/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/hanxintong/banma_llm_base_model/storage/magpie-main/scripts/../exp/gen_res.py", line 59, in
model_config = model_configs[args.model_path]
KeyError: '/mnt/dolphinfs/hdd_pool/docker/user/hadoop-hmart-peisongpa/lijiguo/data/models/Qwen2-7B-Instruct'