Closed boyugou closed 2 months ago
GPT-3.5 (very very) often fails to follow the required JSON format, leading to repeated errors. I would suggest switching to the cheap and more powerful GPT-4o mini.
Additionally, setting the global reward to 'no' seems more reasonable, considering the costs and the default implementation in the paper.
This bug affects about 3 tasks during evaluation.
Without this implementation, the self.page will be always linking to the initial tab, which makes agent always generating the open new tab operation, due to the wrong observation. (Several cases I remember were: 1. one ikea task 2. find the cheapest parking lot near the airport.