vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
26.77k stars 3.92k forks source link

[CI][Contribution Welcomed] Conditional Testing #4569

Open simon-mo opened 4 months ago

simon-mo commented 4 months ago

Anything you want to discuss about vllm.

Currently we run all CI tests matrix on every single commit in pull requests. The CI cost of the vLLM has been doubling each week as we add more tests and receiving many PRs from the community.

A good first step would be to only run some tests when relevant code is changed. For example, do not run unit/integration tests when docs or examples are changed.

cadedaniel commented 4 months ago

I feel the speculative decoding tests will balloon quite a bit as we add more framework features; we can get ahead of this by only triggering them if there's framework changes.

robertgshaw2-neuralmagic commented 4 months ago

We could also just require explicit triggering of the CI (and doing this would be required for merging)

But since we have so many PRs that are often WIP, this would reduce useless CI runs

youkaichao commented 4 months ago

We could also just require explicit triggering of the CI (and doing this would be required for merging)

But since we have so many PRs that are often WIP, this would reduce useless CI runs

+1 for this, only run ci when we want to. i often have wip pr, or approved pr, and i'm adding small changes like additional comments. for these things, running ci for every commit is not necessary.

dgoupil commented 4 months ago

Explicit / on-demand CI makes a lot of sense. We can also introduce test categories (functional test, coverage, security, performance test, engine, api server) so the PR owner can trigger the ones that fit their need while the PR is in progress. As Robert said, with some categories required to pass for a PR to be ready for review and merge. Ultimately, we should think about what we want to catch before merging to master.

zhuohan123 commented 4 months ago

+1 for only start the ci when needed

simon-mo commented 4 months ago

Blocking step/explicit trigger might work but we also need to think about how not to slow down contribution. A low hanging fruit can be conditional build on kernel changes. Currently the paged attention and punica kernel tests take a long time to run but almost no PR touches it.

robertgshaw2-neuralmagic commented 4 months ago

+1 on kernels

Another culprit is the cache_ops

simon-mo commented 4 months ago

A great idea from @zhuohan123 is that we can do test by stages and only when the sanity test passed the next model tests will be kicked off etc...

DarkLight1337 commented 3 months ago

5140 looks promising as a first step.

dgoupil commented 3 months ago

Thanks @DarkLight1337 Next step will be to push the diff command to a script and make buildkite pointing to it. But that will require some synchronization.

DarkLight1337 commented 2 months ago

Ref. #5816