vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.36k stars 3.33k forks source link

[Misc]: need "first good issue" #4437

Closed HarryWu99 closed 1 month ago

HarryWu99 commented 2 months ago

Anything you want to discuss about vllm.

As a beginner, there are too many issues and PRs, and I find it hard to start contributing.

Could anyone please add good first issue label to some issues? So that beginner like me can get started quickly.

Thanks!

robertgshaw2-neuralmagic commented 2 months ago

https://github.com/vllm-project/vllm/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22

robertgshaw2-neuralmagic commented 2 months ago

@HarryWu99 how big of a feature do you want to take on?

HarryWu99 commented 2 months ago

how big of a feature do you want to take on?

@robertgshaw2-neuralmagic I don't know yet.šŸ˜‚ I'm still getting familiar with the code. For now, I am interested in "sparse kv cache". Hope to develop a related feature, if my ability allows.

Before that, I'm also happy to deal with some simple bug fixes.

robertgshaw2-neuralmagic commented 2 months ago

@HarryWu99 cool! A good one to take on eventually would be StreamingLLM in BlockManagerv2

When I was ramping on the codebase, I found that looking into metrics / monitoring is a decent place to start since you have to touch many pieces

I have a halfway done branch I could use some help pushing over the line

robertgshaw2-neuralmagic commented 2 months ago

In particular, https://github.com/ronensc/vllm/pull/1/files

Some of this PR I merged in today, but the rest of the new metrics need to be rebased onto main and ensure correctness / testing

If you want to pick it up to get the ball rolling let me know

HarryWu99 commented 2 months ago

@robertgshaw2-neuralmagic yes, I'd love to have a try.

robertgshaw2-neuralmagic commented 2 months ago

Cool - Iā€™m going to pick this back up on Wednesday or so, so please keep me in the loop to your progress

Kaiyang-Chen commented 2 months ago

StreamingLLM in BlockManagerv2

@robertgshaw2-neuralmagic Hi, i am interested in implementing this feature, but before the FlashInfer that enable attention kernel to apply rotary embedding in place being merged, are there any decent ways for us to do that?

HarryWu99 commented 2 months ago

StreamingLLM in BlockManagerv2

I am interested in it, too.šŸ˜†

robertgshaw2-neuralmagic commented 2 months ago

@HarryWu99 we also have some ongoing work for embedding models that could use a hand