[ ] I have checked #657 to validate if my issue is covered by community support
Describe the issue
i am using lamma 8b model for graphrag. the limit of llama3-8b-8192 of the groq is per minute 30k token, when the token is exceeded, but the code invoke the api, at that time code has to sleep for some time.
Steps to reproduce
No response
GraphRAG Config Used
encoding_model: cl100k_base
skip_workflows: []
llm:
api_key: ${GROQ_API_KEY} # groq api key
type: openai_chat # or azure_openai_chat
model: llama3-8b-8192
model_supports_json: true # recommended if this is available for your model.
max_tokens: 4000
# request_timeout: 180.0
api_base: https://api.groq.com/openai/v1
# api_version: 2024-02-15-preview
# organization: <organization_id>
# deployment_name: <azure_model_deployment_name>
tokens_per_minute: 2000 # set a leaky bucket throttle
requests_per_minute: 1 # set a leaky bucket throttle
max_retries: 3
max_retry_wait: 10000.0
sleep_on_rate_limit_recommendation: true # whether to sleep when azure suggests wait-times
concurrent_requests: 1 # the number of parallel inflight requests that may be made / default is 25 / reduce if using the groq
Is there an existing issue for this?
Describe the issue
i am using lamma 8b model for graphrag. the limit of llama3-8b-8192 of the groq is per minute 30k token, when the token is exceeded, but the code invoke the api, at that time code has to sleep for some time.
Steps to reproduce
No response
GraphRAG Config Used
Logs and screenshots
No response
Additional Information