vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
27.68k stars 4.08k forks source link

Incorrect completions with tensor parallel size of 8 on MI300X GPUs #2817

Closed seungduk-yanolja closed 3 weeks ago

seungduk-yanolja commented 7 months ago

I'm encountering an issue where vLLM fails to generate complete or sensible responses when the tensor parallel size is set to 8 on MI300X GPUs. Completions work as expected with tensor parallel sizes of 1 and 4.

Expected behavior:

vLLM should generate a correct and meaningful completion for the given prompt, similar to its behavior with tensor parallel sizes of 1 and 4.

Actual behavior:

vLLM provides an incomplete or nonsensical response, often similar to the following:

    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": " <"
            },
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 96,
        "total_tokens": 99,
        "completion_tokens": 3
    }

System information:

apt show rocm-libs -a
Package: rocm-libs
Version: 6.0.0.60000-91~20.04
Status: install ok installed
Priority: optional
Section: devel
Maintainer: ROCm Dev Support <rocm-dev.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 2.0.0.60000-91~20.04), hipblaslt (= 0.6.0.60000-91~20.04), hipfft (= 1.0.12.60000-91~20.04), hipsolver (= 2.0.0.60000-91~20.04), hipsparse (= 3.0.0.60000-91~20.04), hiptensor (= 1.1.0.60000-91~20.04), miopen-hip (= 3.00.0.60000-91~20.04), half (= 1.12.0.60000-91~20.04), rccl (= 2.18.3.60000-91~20.04), rocalution (= 3.0.3.60000-91~20.04), rocblas (= 4.0.0.60000-91~20.04), rocfft (= 1.0.23.60000-91~20.04), rocrand (= 2.10.17.60000-91~20.04), hiprand (= 2.10.16.60000-91~20.04), rocsolver (= 3.24.0.60000-91~20.04), rocsparse (= 3.0.2.60000-91~20.04), rocm-core (= 6.0.0.60000-91~20.04), composablekernel-dev (= 1.1.0.60000-91~20.04), hipblas-dev (= 2.0.0.60000-91~20.04), hipblaslt-dev (= 0.6.0.60000-91~20.04), hipcub-dev (= 3.0.0.60000-91~20.04), hipfft-dev (= 1.0.12.60000-91~20.04), hipsolver-dev (= 2.0.0.60000-91~20.04), hipsparse-dev (= 3.0.0.60000-91~20.04), hiptensor-dev (= 1.1.0.60000-91~20.04), miopen-hip-dev (= 3.00.0.60000-91~20.04), rccl-dev (= 2.18.3.60000-91~20.04), rocalution-dev (= 3.0.3.60000-91~20.04), rocblas-dev (= 4.0.0.60000-91~20.04), rocfft-dev (= 1.0.23.60000-91~20.04), rocprim-dev (= 3.0.0.60000-91~20.04), rocrand-dev (= 2.10.17.60000-91~20.04), hiprand-dev (= 2.10.16.60000-91~20.04), rocsolver-dev (= 3.24.0.60000-91~20.04), rocsparse-dev (= 3.0.2.60000-91~20.04), rocthrust-dev (= 3.0.0.60000-91~20.04), rocwmma-dev (= 1.3.0.60000-91~20.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: unknown
APT-Manual-Installed: yes
APT-Sources: /var/lib/dpkg/status
Description: Radeon Open Compute (ROCm) Runtime software stack
hliuca commented 7 months ago

build RCCL using newer version (or latest) and dynamic link (LD_LIBRARY_PATH)?

hongxiayang commented 2 months ago

@seungduk-yanolja Using the older version (ROCm 6.0) you might need --enforce-eager for multi-gpu case. In the current main branch (ROCm 6.1.x with other patchs), this should be fine with the default graph mode. Please test again and update the issue if this still happens to you.

hongxiayang commented 3 weeks ago

Closing this issue. If you see any new issue with the current main branch, please open it again