Open tuoyuan123 opened 1 year ago
We can certainly use a paid account to increase API restrictions, but can we optimize the frequency of API calls and demonstrate how many APIs need to be called per second to run this project? Thank you for your efforts.
another thought is that we can have a sleep technique, when hitting the chatgpt speed limit, it can hold the process and retry after 20s.
@iam153
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
time.sleep(20)
if not text:
text = "this is blank"
return openai.Embedding.create(
input=[text], model=model)['data'][0]['embedding']
But when I run "run 2" it says "xxx is TOKEN LIMIT EXCEEDED", I don't know how to remove this limitation, a little slower speed is acceptable, I hope I can view the demo normally.
Me too. The first two txts received proper output, but the rest are "TOKEN LIMIT EXCEEDED"s.
Enter option: run 1
GNS FUNCTION: <generate_wake_up_hour>
TOKEN LIMIT EXCEEDED
TOKEN LIMIT EXCEEDED
TOKEN LIMIT EXCEEDED
TOKEN LIMIT EXCEEDED
TOKEN LIMIT EXCEEDED
=== persona/prompt_template/v2/wake_up_hour_v1.txt
~~~ persona ---------------------------------------------------
Isabella Rodriguez
~~~ gpt_param ----------------------------------------------------
{'engine': 'text-davinci-002', 'max_tokens': 5, 'temperature': 0.8, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': ['\n']}
~~~ prompt_input ----------------------------------------------
[...]
~~~ prompt ----------------------------------------------------
...
~~~ output ----------------------------------------------------
8
=== END ==========================================================
GNS FUNCTION: <generate_first_daily_plan>
TOKEN LIMIT EXCEEDED
=== persona/prompt_template/v2/daily_planning_v6.txt
~~~ persona ---------------------------------------------------
Isabella Rodriguez
~~~ gpt_param ----------------------------------------------------
{'engine': 'text-davinci-003', 'max_tokens': 500, 'temperature': 1, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': None}
~~~ prompt_input ----------------------------------------------
[...]
~~~ prompt ----------------------------------------------------
...
~~~ output ----------------------------------------------------
['wake up and complete the morning routine at 8:00 am']
=== END ==========================================================
GNS FUNCTION: <generate_hourly_schedule>
TOKEN LIMIT EXCEEDED
=== persona/prompt_template/v2/generate_hourly_schedule_v2.txt
~~~ persona ---------------------------------------------------
Isabella Rodriguez
~~~ gpt_param ----------------------------------------------------
{'engine': 'text-davinci-003', 'max_tokens': 50, 'temperature': 0.5, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': ['\n']}
~~~ prompt_input ----------------------------------------------
[...]
~~~ prompt ----------------------------------------------------
Hourly schedule format:
...
~~~ output ----------------------------------------------------
TOKEN LIMIT EXCEEDED
=== END ==========================================================
Me too. The first two txts received proper output, but the rest are "TOKEN LIMIT EXCEEDED"s.
Enter option: run 1 GNS FUNCTION: <generate_wake_up_hour> TOKEN LIMIT EXCEEDED TOKEN LIMIT EXCEEDED TOKEN LIMIT EXCEEDED TOKEN LIMIT EXCEEDED TOKEN LIMIT EXCEEDED === persona/prompt_template/v2/wake_up_hour_v1.txt ~~~ persona --------------------------------------------------- Isabella Rodriguez ~~~ gpt_param ---------------------------------------------------- {'engine': 'text-davinci-002', 'max_tokens': 5, 'temperature': 0.8, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': ['\n']} ~~~ prompt_input ---------------------------------------------- [...] ~~~ prompt ---------------------------------------------------- ... ~~~ output ---------------------------------------------------- 8 === END ========================================================== GNS FUNCTION: <generate_first_daily_plan> TOKEN LIMIT EXCEEDED === persona/prompt_template/v2/daily_planning_v6.txt ~~~ persona --------------------------------------------------- Isabella Rodriguez ~~~ gpt_param ---------------------------------------------------- {'engine': 'text-davinci-003', 'max_tokens': 500, 'temperature': 1, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': None} ~~~ prompt_input ---------------------------------------------- [...] ~~~ prompt ---------------------------------------------------- ... ~~~ output ---------------------------------------------------- ['wake up and complete the morning routine at 8:00 am'] === END ========================================================== GNS FUNCTION: <generate_hourly_schedule> TOKEN LIMIT EXCEEDED === persona/prompt_template/v2/generate_hourly_schedule_v2.txt ~~~ persona --------------------------------------------------- Isabella Rodriguez ~~~ gpt_param ---------------------------------------------------- {'engine': 'text-davinci-003', 'max_tokens': 50, 'temperature': 0.5, 'top_p': 1, 'stream': False, 'frequency_penalty': 0, 'presence_penalty': 0, 'stop': ['\n']} ~~~ prompt_input ---------------------------------------------- [...] ~~~ prompt ---------------------------------------------------- Hourly schedule format: ... ~~~ output ---------------------------------------------------- TOKEN LIMIT EXCEEDED === END ==========================================================
TOKEN LIMIT EXCEEDED is the default exception details in gpt_structure.py(line 245). So, Some errors may have occurred.
@tuoyuan123 if you're able to create multiple azure embedding instances - you could use LiteLLM's openai-compatible server to loadbalance across from them:
Step 1: Put your instances in a config.yaml
model_list:
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8001
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8002
- model_name: zephyr-beta
litellm_params:
model: huggingface/HuggingFaceH4/zephyr-7b-beta
api_base: http://0.0.0.0:8003
Step 2: Install LiteLLM
$ pip install litellm
Step 3: Start litellm proxy w/ config.yaml
$ litellm --config /path/to/config.yaml
I'm having the same issue, all output I get is "TOKEN LIMIT EXCEEDED", anyone found a solution for this?
I'm having the same issue, all output I get is "TOKEN LIMIT EXCEEDED", anyone found a solution for this?
That's the exact same issue I am having as well.
When sending the command 'run 1', too many calls to the openai API resulted in access being denied. The error is as follows:
openai.error.RateLimitError: Rate limit reached for default-text-embedding-ada-002 in organization org-3AW0AYkBXyFwNdJ9WLlqK5Mo on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method.