microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
20.06k stars 1.96k forks source link

[Issue]: <sleeep> sleep_on_rate_limit_recommendation is not working for groq #820

Closed prasantpoudel closed 3 months ago

prasantpoudel commented 3 months ago

Is there an existing issue for this?

Describe the issue

i am using lamma 8b model for graphrag. the limit of llama3-8b-8192 of the groq is per minute 30k token, when the token is exceeded, but the code invoke the api, at that time code has to sleep for some time.

Steps to reproduce

No response

GraphRAG Config Used

encoding_model: cl100k_base
skip_workflows: []
llm:
  api_key: ${GROQ_API_KEY} # groq api key
  type: openai_chat # or azure_openai_chat
  model: llama3-8b-8192
  model_supports_json: true # recommended if this is available for your model.
  max_tokens: 4000
  # request_timeout: 180.0
  api_base: https://api.groq.com/openai/v1
  # api_version: 2024-02-15-preview
  # organization: <organization_id>
  # deployment_name: <azure_model_deployment_name>
  tokens_per_minute: 2000 # set a leaky bucket throttle
  requests_per_minute: 1 # set a leaky bucket throttle
  max_retries: 3
  max_retry_wait: 10000.0
  sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
  concurrent_requests: 1 # the number of parallel inflight requests that may be made / default is 25 / reduce if using the groq

Logs and screenshots

No response

Additional Information

natoverse commented 3 months ago

Routing to #657