Closed ProKil closed 1 month ago
reward prompts are used to debug the process during the sync (i.e., when batch size =1) mode, so maybe we can drop the reward prompts during the async mode (batch size >=2)
Okay, now I found this bug could influence more functions that are currently supported in the Sotopia.
basically, any function that uses:
async def aevaluate_one_episode(
episode: EpisodeLog,
model: str = "gpt-4",
tag: str | None = None,
push_to_db: bool = False,
) -> None:
Could be influenced. We somehow need to find a way to extract reward prompts safely cause this would be crucial for establishing robust evaluators. This could be relevant to #164 as well.
@bugsz @ruiyiw here as well, and let us know whether #sotopia-better-eval is influenced by this?
Upon checking I can confirm it is not affected as long as the original Sotopia episodes are not re-evaluated. Actually this only affect those that need re-evaluation.
Description of the bug
This line stores the reward prompt from the instance member --
evaluator.prompt
which is updated in each__acall__
. This is a dangerous operation since the prompt is lost after several parallel call toenv.astep
.https://github.com/sotopia-lab/sotopia/blame/c4fdb166bab6f20ee541c48dd614981d38303b19/sotopia/envs/parallel.py#L564
Steps To Reproduce
In the current codebase shared in #7, you can find that in 66% of the reward prompts neither of the character names is mentioned.
@sharonwx54 contributed this script to reproduce:
Additional Information
We can either