sotopia-lab / sotopia

Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
https://docs.sotopia.world
MIT License
127 stars 16 forks source link

Sotopia Benchmark CLI API #69

Closed XuhuiZhou closed 2 weeks ago

XuhuiZhou commented 1 month ago

📑 Description

This pull request add a new api that benchmarks a language model using the default LLMAgent class. Here is the desired api we want to achieve:

sotopia_benchmark \
--model <model_name> \
--partner-model <partner_model_name> \
--evaluator-model <evaluator_model_name> \
--task <agent_env_combo_id>

After calling this cli command, the sotopia benchmark will evaluate the performance of the given model through simulating its interaction with another LLMAgent using the partner model with evaluator model on the given task.

We will also include an bash script which loads all of agent_env_combo from a given subset.

✅ Checks

ℹ Additional Information

XuhuiZhou commented 1 month ago

@ProKil python sotopia/benchmark/cli.py --model=gpt-4o

yields error:

RuntimeError: Type not yet supported: typing.Literal['togethercomputer/llama-2-7b-chat', 'togethercomputer/llama-2-70b-chat', 
'togethercomputer/mpt-30b-chat', 'gpt-3.5-turbo', 'gpt-3.5-turbo-finetuned', 'gpt-3.5-turbo-ft-MF', 'text-davinci-003', 'gpt-4', 
'gpt-4-turbo', 'human', 'redis', 'mistralai/Mixtral-8x7B-Instruct-v0.1', 'together_ai/togethercomputer/llama-2-7b-chat', 
'together_ai/togethercomputer/falcon-7b-instruct', 'meta-llama/Llama-3-8b-chat-hf', 'meta-llama/Llama-3-70b-chat-hf', 
'groq/llama3-70b-8192']

Can you check?

XuhuiZhou commented 1 month ago

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]
ProKil commented 1 month ago

Already fixed @ProKil !

However, It still has this error:

python sotopia/benchmark/cli.py --model=gpt-4o --batch-size=1

yields error:

  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose
    self._transport.close()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon
    self._check_closed()
  File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
Running all envs in batch:  12%|███████▊                                                         | 24/200 [14:14<1:44:28, 35.62s/it]

What is the full back trace?

XuhuiZhou commented 1 month ago

@ProKil RROR:asyncio:Task exception was never retrieved | 0/1 [00:00<?, ?it/s] future: <Task finished name='Task-5021' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')> Traceback (most recent call last): File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose await self._transport.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose await self._pool.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose await self._close_connections(closing_connections) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections await connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose await self._connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose await self._network_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose await self._stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose await self.transport_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose self._transport.close() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close self._loop.call_soon(self._call_connection_lost, None) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon self._check_closed() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-5022' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')> Traceback (most recent call last): File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose await self._transport.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose await self._pool.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose await self._close_connections(closing_connections) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections await connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose await self._connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose await self._network_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose await self._stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose await self.transport_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose self._transport.close() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close self._loop.call_soon(self._call_connection_lost, None) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon self._check_closed() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed ERROR:asyncio:Task exception was never retrieved future: <Task finished name='Task-5023' coro=<AsyncClient.aclose() done, defined at /Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py:2011> exception=RuntimeError('Event loop is closed')> Traceback (most recent call last): File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_client.py", line 2018, in aclose await self._transport.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpx/_transports/default.py", line 385, in aclose await self._pool.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 313, in aclose await self._close_connections(closing_connections) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 305, in _close_connections await connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/connection.py", line 171, in aclose await self._connection.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_async/http11.py", line 265, in aclose await self._network_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/httpcore/_backends/anyio.py", line 55, in aclose await self._stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/streams/tls.py", line 193, in aclose await self.transport_stream.aclose() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 1261, in aclose self._transport.close() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/selector_events.py", line 839, in close self._loop.call_soon(self._call_connection_lost, None) File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 761, in call_soon self._check_closed() File "/Users/xuhuizhou/miniconda3/envs/sotopia/lib/python3.11/asyncio/base_events.py", line 519, in _check_closed raise RuntimeError('Event loop is closed') RuntimeError: Event loop is closed Running all envs in batch: 12%|███████▊ | 24/200 [14:14<1:44:28, 35.62s/it]

Aborted. Running one batch: 0%| | 0/1 [00:21<?, ?it/s]

ProKil commented 1 month ago

@XuhuiZhou Feel free to report if the issue you mentioned above is still happening with the latest the change. And share you logs when that happens. If you can confirm that this doesn't happen any more, we can proceed to merge this PR.

ProKil commented 1 month ago

@XuhuiZhou Let me know if the above issues are still happening. After these two requested changes are finished, we are good to merge and release v0.1.

ProKil commented 2 weeks ago

Please also git merge main to resolve the conflicts

XuhuiZhou commented 2 weeks ago

Please also git merge main to resolve the conflicts

Let's figure all out at first before I merge main, otherwise it's just again and again I doing the repetitive things

XuhuiZhou commented 2 weeks ago

@ProKil Can you try to benchmark a model first?

XuhuiZhou commented 2 weeks ago
image

@ProKil Not sure why this happens tho

ProKil commented 2 weeks ago

mypy --install-types .