hkust-nlp / AgentBoard

An Analytical Evaluation Board of Multi-turn LLM Agents
219 stars 22 forks source link

webshop好像也有一些bug #9

Closed Fu-Dayuan closed 4 months ago

Fu-Dayuan commented 4 months ago

{ "id": 3, "task_name": "webshop", "goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars", "difficulty": "easy", "is_done": false, "progress_rate": 0.9166666666666666, "grounding_acc": 1.0, "score_change_record": [ [ 1, 0.3333333333333333 ], [ 2, 0.6666666666666666 ], [ 4, 0.9166666666666666 ] ], "trajectory": { "Interaction Turn 0": { "Goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars", "Observation": "WEB PAGE: {Let us begin shopping \n[Search]}", "Action": "reset[]", "Progress Rate": 0.0 }, "Interaction Turn 1": { "Action": "search[gluten free vegetarian smoked peppered bacon 4 ounce pack of 2 price < 60.00]", "Observation": "WEB PAGE: {\n[Back to Search] \nPage 1 (Total results: 15) \n[Next >] \n[B06Y96MXJV] \nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B06Y96N1KG] \nSpicy Hot Pepper Sea Salt 3-Pack: Ghost Pepper, Jalapeno, Habanero - All Natural, Delicious Blends of Peppers & Sea Salt - Gluten-Free, Kosher, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B07GJTKYJQ] \nLouisville Vegan Jerky - 5 Flavor Variety Pack, Non-GMO Soy Protein, Gluten-Free (Black Pepper, Buffalo Dill, Pepperoni, Maple Bacon, and Carolina BBQ, 3 Ounces) \n$42.99 }", "Progress Rate": 0.3333333333333333 }, "Interaction Turn 2": { "Action": "click[B06Y96MXJV]", "Observation": "WEB PAGE: {\n[Back to Search] \n[< Prev] \nflavor [cocktail salts][dessert toppers][exotic][finishing salts][foodie gift][french blends][french classics][fruits & flowers][garlic lover's][gourmet grillmaster][hawaiian][herb garden][i love you & food][italian classics][italian flavors][keto smoked][mediterranean diet][natural salts][olive oil dippers][pacific cooking][paleo][pegan smoked][pizza salts][smoked #1][smoked bacon][smoked classics][smoked grillmaster][spicy chili peppers][steak rubs][summer fruits][taco seasonings][the pepper][truffle][tuxedo][vegan smoked][vegetarian smoked]\nsize [4 ounce (pack of 3)][3 piece assortment][4 ounce (pack of 2)][4 ounce (pack of 6)][4 ounce (3 count)][4 ounce (6 count)]\nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \nPrice: $29.99 \nRating: N.A. \n[Description] \n[Features] \n[Reviews] \n[Attributes] \n[Buy Now] }", "Progress Rate": 0.6666666666666666 }, "Interaction Turn 3": { "Action": "click[4 ounce (pack of 2)]", "Observation": "WEB PAGE: {You have clicked 4 ounce (pack of 2).}", "Progress Rate": 0.6666666666666666 }, "Interaction Turn 4": { "Action": "click[Buy Now]", "Observation": "WEB PAGE: {Result: [False]}", "Progress Rate": 0.9166666666666666 } } }

{ "id": 3, "task_name": "webshop", "goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars", "difficulty": "easy", "is_done": false, "progress_rate": 0.75, "grounding_acc": 1.0, "score_change_record": [ [ 1, 0.3333333333333333 ], [ 2, 0.6666666666666666 ], [ 4, 0.75 ] ], "trajectory": { "Interaction Turn 0": { "Goal": "i need gluten free vegetarian smoked peppered bacon - 4 ounce (pack of 2), and price lower than 60.00 dollars", "Observation": "WEB PAGE: {Let us begin shopping \n[Search]}", "Action": "reset[]", "Progress Rate": 0.0 }, "Interaction Turn 1": { "Action": "search[gluten free vegetarian smoked peppered bacon 4 ounce pack of 2 price < 60.00]", "Observation": "WEB PAGE: {\n[Back to Search] \nPage 1 (Total results: 15) \n[Next >] \n[B06Y96MXJV] \nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B06Y96N1KG] \nSpicy Hot Pepper Sea Salt 3-Pack: Ghost Pepper, Jalapeno, Habanero - All Natural, Delicious Blends of Peppers & Sea Salt - Gluten-Free, Kosher, No MSG, Non-GMO (12 total oz.) \n$29.99 \n[B07GJTKYJQ] \nLouisville Vegan Jerky - 5 Flavor Variety Pack, Non-GMO Soy Protein, Gluten-Free (Black Pepper, Buffalo Dill, Pepperoni, Maple Bacon, and Carolina BBQ, 3 Ounces) \n$42.99 }", "Progress Rate": 0.3333333333333333 }, "Interaction Turn 2": { "Action": "click[B06Y96MXJV]", "Observation": "WEB PAGE: {\n[Back to Search] \n[< Prev] \nflavor [cocktail salts][dessert toppers][exotic][finishing salts][foodie gift][french blends][french classics][fruits & flowers][garlic lover's][gourmet grillmaster][hawaiian][herb garden][i love you & food][italian classics][italian flavors][keto smoked][mediterranean diet][natural salts][olive oil dippers][pacific cooking][paleo][pegan smoked][pizza salts][smoked #1][smoked bacon][smoked classics][smoked grillmaster][spicy chili peppers][steak rubs][summer fruits][taco seasonings][the pepper][truffle][tuxedo][vegan smoked][vegetarian smoked]\nsize [4 ounce (pack of 3)][3 piece assortment][4 ounce (pack of 2)][4 ounce (pack of 6)][4 ounce (3 count)][4 ounce (6 count)]\nSmoked Bacon Sea Salt 3-Pack: Smoked Bacon Chipotle, Smoked Bacon and Onion, Smoked Peppered Bacon - All-Natural Sea Salts Slowly Smoked Over Alderwood - No Gluten, No MSG, Non-GMO (12 total oz.) \nPrice: $29.99 \nRating: N.A. \n[Description] \n[Features] \n[Reviews] \n[Attributes] \n[Buy Now] }", "Progress Rate": 0.6666666666666666 }, "Interaction Turn 3": { "Action": "click[4 ounce (pack of 2)]", "Observation": "WEB PAGE: {You have clicked 4 ounce (pack of 2).}", "Progress Rate": 0.6666666666666666 }, "Interaction Turn 4": { "Action": "click[Buy Now]", "Observation": "WEB PAGE: {Result: [False]}", "Progress Rate": 0.75 } } }

同时sample两次,过程一样,结果却不一样?

Fu-Dayuan commented 4 months ago

貌似如果同时有多个程序在跑webshop只挂一个搜索引擎(也没法挂多个吧。。。)pr只要不是1的就会越来越低

zzh068 commented 4 months ago

Thank you for your interest in AgentBoard. Webshop by default will only launch one accessible port and does not support running multiple models at the same time, as this will lead to confusion in reading the webpage. If you want to run multiple models at the same time, you need to change to different port numbers. The current port number is 3000 and code can be found at: https://github.com/hkust-nlp/AgentBoard/blob/cc433e8392a7c94e1de9406dd0c64c39d2e24440/eval_configs/main_results_all_tasks.yaml#L247