Closed HarryWu99 closed 1 month ago
@HarryWu99 how big of a feature do you want to take on?
how big of a feature do you want to take on?
@robertgshaw2-neuralmagic I don't know yet.š I'm still getting familiar with the code. For now, I am interested in "sparse kv cache". Hope to develop a related feature, if my ability allows.
Before that, I'm also happy to deal with some simple bug fixes.
@HarryWu99 cool! A good one to take on eventually would be StreamingLLM in BlockManagerv2
When I was ramping on the codebase, I found that looking into metrics / monitoring is a decent place to start since you have to touch many pieces
I have a halfway done branch I could use some help pushing over the line
In particular, https://github.com/ronensc/vllm/pull/1/files
Some of this PR I merged in today, but the rest of the new metrics need to be rebased onto main and ensure correctness / testing
If you want to pick it up to get the ball rolling let me know
@robertgshaw2-neuralmagic yes, I'd love to have a try.
Cool - Iām going to pick this back up on Wednesday or so, so please keep me in the loop to your progress
StreamingLLM in BlockManagerv2
@robertgshaw2-neuralmagic Hi, i am interested in implementing this feature, but before the FlashInfer that enable attention kernel to apply rotary embedding in place being merged, are there any decent ways for us to do that?
StreamingLLM in BlockManagerv2
I am interested in it, too.š
@HarryWu99 we also have some ongoing work for embedding models that could use a hand
Anything you want to discuss about vllm.
As a beginner, there are too many issues and PRs, and I find it hard to start contributing.
Could anyone please add
good first issue
label to some issues? So that beginner like me can get started quickly.Thanks!