Open zhengy001 opened 3 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
The output of `python collect_env.py`
```text Your output of `python collect_env.py` here ```🐛 Describe the bug
Hi vLLM experts,
This might not be a bug unless I miss something.
If two identical new prompts are input at the same time, no preceding same prompt has been given so far and 0 cache hit. BlockSpaceManagerV1 will allocate the same blocks because of the same hashing. Then where is the logic to dedupe the computation, if not, will
reshape_and_cache
update the same kv cache slots simultaneously and lead to concurrent issue?