vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.04k stars 3.82k forks source link

[Bug]: Issue with using min_tokens from recent build #3760

Open Sriharsha-hatwar opened 5 months ago

Sriharsha-hatwar commented 5 months ago

Your current environment

My current environment setup is :

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.1.2
[pip3] triton==2.1.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.1.2                    pypi_0    pypi
[conda] triton                    2.1.0                    pypi_0    pypiROCM Version: Could not collect
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
GPU0    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      0,2     0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

Currently, I am trying to use vllm for the task of summarization and I was getting the error of min_tokens not being present in the SamplingParams API. Hence, I wanted to install from source. So I followed the source installation and I am facing this issue :

Obtaining file:///work/pi_dhruveshpate_umass_edu/compressed-llm/experiments/shatwar/vllm
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... error
  error: subprocess-exited-with-error

  × Getting requirements to build editable did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      /tmp/pip-build-env-9q0l22qz/overlay/lib/python3.9/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
        device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
      Traceback (most recent call last):
        File "/work/pi_dhruveshpate_umass_edu/compressed-llm/envs/vllm_cuda_12.2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/work/pi_dhruveshpate_umass_edu/compressed-llm/envs/vllm_cuda_12.2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/work/pi_dhruveshpate_umass_edu/compressed-llm/envs/vllm_cuda_12.2/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 132, in get_requires_for_build_editable
          return hook(config_settings)
        File "/tmp/pip-build-env-9q0l22qz/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 448, in get_requires_for_build_editable
          return self.get_requires_for_build_wheel(config_settings)
        File "/tmp/pip-build-env-9q0l22qz/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/pip-build-env-9q0l22qz/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-9q0l22qz/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 351, in <module>
        File "<string>", line 283, in get_vllm_version
        File "<string>", line 254, in get_nvcc_cuda_version
      TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

I saw several issues in the issue tracker, but could not follow a particular fix for this. Could anyone please point to a fix or let me know from which build we can see the changed SamplingParam ?

njhill commented 5 months ago

@Sriharsha-hatwar please try with the latest 0.4.0 release.