openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Other
14.76k stars 2.58k forks source link

Suppress 'HTTP/1.1 200 OK' logs from openai library #1468

Closed JunShern closed 7 months ago

JunShern commented 7 months ago

Since the openai-python library update, eval runs are getting flooded with excessive "HTTP/1.1. 200 OK" logs from the openai library:

junshern@JunSherns-MacBook-Pro ⚒ oaieval gpt-3.5-turbo 2d_movement
[2024-02-15 12:22:08,549] [registry.py:262] Loading registry from /Users/junshern/projects/oss_evals/evals/evals/registry/evals
[2024-02-15 12:22:08,898] [registry.py:262] Loading registry from /Users/junshern/.evals/evals
[2024-02-15 12:22:08,900] [oaieval.py:211] Run started: 240215042208OCODJ2NY
[2024-02-15 12:22:08,949] [data.py:94] Fetching /Users/junshern/projects/oss_evals/evals/evals/registry/data/2d_movement/samples.jsonl
[2024-02-15 12:22:08,949] [eval.py:36] Evaluating 100 samples
[2024-02-15 12:22:08,955] [eval.py:144] Running in threaded mode with 10 threads!
  0%|                                                                                                                                                                                                                                                 | 0/100 [00:00<?, ?it/s][2024-02-15 12:22:10,338] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
  1%|██▎                                                                                                                                                                                                                                      | 1/100 [00:01<02:17,  1.39s/it][2024-02-15 12:22:10,355] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,384] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,392] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,393] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,395] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,400] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,400] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,401] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,432] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:10,890] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
 11%|█████████████████████████▌                                                                                                                                                                                                              | 11/100 [00:01<00:12,  7.05it/s][2024-02-15 12:22:10,907] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,319] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
 13%|██████████████████████████████▏                                                                                                                                                                                                         | 13/100 [00:02<00:13,  6.36it/s][2024-02-15 12:22:11,421] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
 14%|████████████████████████████████▍                                                                                                                                                                                                       | 14/100 [00:02<00:12,  6.65it/s][2024-02-15 12:22:11,463] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,504] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,524] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,542] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
 18%|█████████████████████████████████████████▊                                                                                                                                                                                              | 18/100 [00:02<00:08, 10.17it/s][2024-02-15 12:22:11,564] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,564] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,565] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,570] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
[2024-02-15 12:22:11,829] [_client.py:1027] HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
...

After the change:

junshern@JunSherns-MacBook-Pro ⚒ oaieval gpt-3.5-turbo 2d_movement
[2024-02-15 12:22:20,408] [registry.py:262] Loading registry from /Users/junshern/projects/oss_evals/evals/evals/registry/evals
[2024-02-15 12:22:20,762] [registry.py:262] Loading registry from /Users/junshern/.evals/evals
[2024-02-15 12:22:20,763] [oaieval.py:211] Run started: 240215042220QS3AJAVA
[2024-02-15 12:22:20,812] [data.py:94] Fetching /Users/junshern/projects/oss_evals/evals/evals/registry/data/2d_movement/samples.jsonl
[2024-02-15 12:22:20,812] [eval.py:36] Evaluating 100 samples
[2024-02-15 12:22:20,819] [eval.py:144] Running in threaded mode with 10 threads!
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:08<00:00, 11.96it/s]
[2024-02-15 12:22:29,217] [record.py:371] Final report: {'accuracy': 0.09, 'boostrap_std': 0.029618636025313522}. Logged to /tmp/evallogs/240215042220QS3AJAVA_gpt-3.5-turbo_2d_movement.jsonl
[2024-02-15 12:22:29,217] [oaieval.py:228] Final report:
[2024-02-15 12:22:29,217] [oaieval.py:230] accuracy: 0.09
[2024-02-15 12:22:29,217] [oaieval.py:230] boostrap_std: 0.029618636025313522
[2024-02-15 12:22:29,233] [record.py:360] Logged 200 rows of events to /tmp/evallogs/240215042220QS3AJAVA_gpt-3.5-turbo_2d_movement.jsonl: insert_time=15.670ms
johny-b commented 7 months ago

Just FYI, I the value is set to INFO here: https://github.com/openai/openai-python/blob/7f9e85017a0959e3ba07834880d92c748f8f67ab/src/openai/_utils/_logs.py#L25

I think it would be nice to have some more granularity there.