Open Hodge931 opened 4 months ago
That may affect the results. The reason is that we deliberately design the order of examples so that former examples won't affect later examples.
This is the script for 4 parallel runs. You can also reset the environment more frequently to avoid the inter-example influence.
Thanks a lot for the reply!
Your kind suggestions are highly appreciated!
Hello! Do you mind elaborating on how the earlier tasks are dependent on later tasks? Is there any way to launch separate sites for each new task that we're evaluating so that we can run multiple agents at the same time? How often should the environment resets be happening? Thanks for you help :)
Hello, do you have any advise on how to set up multiple dockers for the same website. For example, we can set up 10 shoping weisite with different port. So we can parallel evaluate it. Thank you!
To speed up the evaluation, I would like to evaluate, say 64 examples in parallel with multiple threads. Does this affect the correctness of the evaluation? Thanks a lot!