Fix clear request with an ID (it was causing a crash on server).
Raise an error when there are too many requests (it should never happen, but it's good to handle that).
Add more prefill lengths to warmup. It will take longer, but it will end up in faster inference for shorter prompts, at least until we find a better fix for bucketing and padding not working as expected.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.
What does this PR do?