Closed bmschmidt closed 1 month ago
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @bmschmidt and the rest of your teammates on Graphite
(1200 request / 300 seconds) is actually 4 rps but this is probably fine
if there were multiple potential things doing embedding requests (I don't think there currently are?) then it should also be fine to burst if needed to keep things snappy - you could send an interactive request immediately and add what would have been the remaining delay to the next background flush to compensate (though would not bother with this if its responsive enough anyway)
oh it looks like this already does first-request-immediately
Oh I thought the new limit was 600 requests, ok.
Nomic's embedding API is rate limited to 2 requests per second, but these can include multiple embeddings. This PR does two things.
Embedder
class into 510 ms groups to ensure that users are automatically kept within the rate limit.