Open sleepwalker2017 opened 6 months ago
cant we have something like automated rope scaling like in alpindales Aphrodite Engine? @WoosukKwon
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
when we are running prefill stage, vllm take multiple requests to do prefill.
Is this limited by the
max_position_embeddings
?I think it's not limited by this number, because each sequence has its own start index, we only need to ensure that each sequence is shorter than
max_position_embeddings
. Is that the fact?How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.