nomic-ai / ts-nomic

Typescript bindings for Atlas
MIT License
5 stars 4 forks source link

handle backoff and bundle requests for API timing #81

Closed bmschmidt closed 1 month ago

bmschmidt commented 1 month ago

Nomic's embedding API is rate limited to 2 requests per second, but these can include multiple embeddings. This PR does two things.

  1. Batches together all requests in the Embedder class into 510 ms groups to ensure that users are automatically kept within the rate limit.
  2. Respects the 429s newly sent from the API server with exponential backoff of up to 8 seconds.

[!IMPORTANT] Batch requests in Embedder class every 510 ms and handle 429 errors with exponential backoff to comply with API rate limits.

  • Behavior:
    • Batches requests in Embedder class every 510 ms to comply with API rate limit of 2 requests per second.
    • Implements exponential backoff up to 8 seconds for 429 errors in flushDeferredEmbeddings().
  • Constants:
    • Increases BATCH_SIZE from 32 to 400 in embedding.ts.
  • Error Handling:
    • Re-queues failed requests due to 429 errors in flushDeferredEmbeddings().
    • Throws error if embedQueue exceeds 100,000 items in embed().
  • Misc:
    • Adjusts setTimeout in periodicallyFlushCache() to 510 ms.

This description was created by Ellipsis for 2dcf44968a272ce20353c4e429282193b96c58cd. It will automatically update as commits are pushed.

bmschmidt commented 1 month ago

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @bmschmidt and the rest of your teammates on Graphite Graphite

apage43 commented 1 month ago

(1200 request / 300 seconds) is actually 4 rps but this is probably fine

if there were multiple potential things doing embedding requests (I don't think there currently are?) then it should also be fine to burst if needed to keep things snappy - you could send an interactive request immediately and add what would have been the remaining delay to the next background flush to compensate (though would not bother with this if its responsive enough anyway)

apage43 commented 1 month ago

oh it looks like this already does first-request-immediately

bmschmidt commented 1 month ago

Oh I thought the new limit was 600 requests, ok.

bmschmidt commented 1 month ago

Merge activity