sasha0552 / vllm-ci

CI scripts designed to build a Pascal-compatible version of vLLM.
MIT License
11 stars 1 forks source link

How to test if patch applied successfully? #6

Open cduk opened 2 months ago

cduk commented 2 months ago

I try to apply the triton patch like this:

pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall triton

Which shows

pip3 install --extra-index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall triton
Looking in indexes: https://pypi.org/simple, https://sasha0552.github.io/vllm-ci/
Collecting triton
  Downloading triton-3.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (209.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 209.4/209.4 MB 23.4 MB/s eta 0:00:00
Collecting filelock
  Downloading filelock-3.15.4-py3-none-any.whl (16 kB)
Installing collected packages: filelock, triton
  Attempting uninstall: filelock
    Found existing installation: filelock 3.15.4
    Uninstalling filelock-3.15.4:
      Successfully uninstalled filelock-3.15.4
  Attempting uninstall: triton
    Found existing installation: triton 2.3.0
    Uninstalling triton-2.3.0:
      Successfully uninstalled triton-2.3.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.3.0 requires triton==2.3.0; platform_system == "Linux" and platform_machine == "x86_64" and python_version < "3.12", but you have triton 3.0.0 which is incompatible.
Successfully installed filelock-3.15.4 triton-3.0.0

So it managed to re-install, but it went suspiciously quickly and there's no indication whether the patches are applied or not. Is there a way to test this?

sasha0552 commented 2 months ago

It's clearly not, since it's downloaded triton==3.0.0, which is not published in this repository (but is published on PyPI). Can you try pip3 install --index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall --no-deps triton? Also make sure it downloads triton==2.3.0 from github.com (if I remember correctly, it prints the URL when installing not from PyPI.)

As for "test if patch applied successfully" - the main indicator is that the following code does not cause a crash:

import torch

import triton
import triton.language as tl

@triton.jit
def test_max_kernel():
    t = tl.zeros([2, 2], dtype=tl.float32)
    m = tl.max(t, 1)
    tl.device_print("max:", m)

@triton.jit
def test_sum_kernel():
    t = tl.zeros([2, 2], dtype=tl.float32)
    s = tl.sum(t, 1)
    tl.device_print("sum:", s)

if True:
    grid = lambda meta: (1, )
    kernel = test_max_kernel[grid]()

if True:
    grid = lambda meta: (1, )
    kernel = test_sum_kernel[grid]()
sasha0552 commented 2 months ago
Correct output (this repo with Pascal GPU(s), or Volta+ GPU(s)) ```py $ python3 test.py pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) max: 0.000000 pid (0, 0, 0) idx (1) max: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 pid (0, 0, 0) idx (0) sum: 0.000000 pid (0, 0, 0) idx (1) sum: 0.000000 ```
Crash ```py LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32 *** SIGABRT received at time=1714270500 on cpu 5 *** ... Fatal Python error: Aborted Stack (most recent call first): File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 200 in llir_to_ptx File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 381 in File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/triton/compiler/compiler.py", line 543 in compile File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/triton/runtime/jit.py", line 532 in run File "/mnt/ml/vllm/vllm/attention/ops/prefix_prefill.py", line 708 in context_attention_fwd File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115 in decorate_context File "/mnt/ml/vllm/vllm/attention/ops/paged_attn.py", line 177 in forward_prefix File "/mnt/ml/vllm/vllm/attention/backends/xformers.py", line 237 in forward File "/mnt/ml/vllm/vllm/attention/layer.py", line 48 in forward File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520 in _call_impl File "/mnt/ml/vllm/venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511 in _wrapped_call_impl File "/mnt/ml/vllm/vllm/model_executor/models/llama.py", line 166 in forward ... ```
cduk commented 2 months ago

Thanks a lot! It looks like it worked! I will try to add the command to vLLM docker file so it automatically fixes it.

# python3 -m pip install --index-url https://sasha0552.github.io/vllm-ci/ --force-reinstall --no-deps triton
Looking in indexes: https://sasha0552.github.io/vllm-ci/
Collecting triton
  Downloading https://github.com/sasha0552/vllm-ci/releases/download/v10/triton-2.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (168.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.1/168.1 MB 16.4 MB/s eta 0:00:00
Installing collected packages: triton
  Attempting uninstall: triton
    Found existing installation: triton 3.0.0
    Uninstalling triton-3.0.0:
      Successfully uninstalled triton-3.0.0
Successfully installed triton-2.3.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
root@b5774f219676:/vllm-workspace# 
# python3 tritontest 
pid (0, 0, 0) idx (0) max: 0.000000
pid (0, 0, 0) idx (1) max: 0.000000
pid (0, 0, 0) idx (0) max: 0.000000
pid (0, 0, 0) idx (1) max: 0.000000
pid (0, 0, 0) idx (0) max: 0.000000
...
pid (0, 0, 0) idx (1) sum: 0.000000
pid (0, 0, 0) idx (0) sum: 0.000000
pid (0, 0, 0) idx (1) sum: 0.000000