princeton-nlp / WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
https://webshop-pnlp.github.io
MIT License
255 stars 53 forks source link

Reward function, data distribution, and the choice oracle #15

Closed sanxing-chen closed 1 year ago

sanxing-chen commented 1 year ago

I have some questions and hope you can help me answer

Please let me know if any of these questions are unclear, thanks!

all_products, product_item_dict, product_prices, _ = load_products(filepath="data/items_shuffle.json", num_products=None, human_goals=True)
...
for j in range(0, 500):
    d = all_data[j]
    item_info = []
    search_result_items = d['all_item_details’] # search results from instructions
    for i in range(len(search_result_items)):
        item_asin = search_result_items[i][0].upper()
        product = product_item_dict[item_asin]
        all_options = list(product['options'].values())
        # generate all possible combinations of options
        all_combinations = list(itertools.product(*all_options))
        if len(all_combinations) > 0:
            maxr = -1
            maxinfo = None
            for c in all_combinations:
                reward, info = get_reward(product, human_goals[j], product_prices[item_asin], dict(enumerate(c)), verbose=True)
                if reward > maxr:
                    maxr = reward
                    maxinfo = info
                    if reward == 1.0:
                        break
            info = maxinfo
            reward = maxr
        else:
            reward, info = get_reward(product, human_goals[j], product_prices[item_asin], [], verbose=True)
        info['r_total'] = reward
        item_info.append(info)
    train_info.append(item_info)
ysymyth commented 1 year ago

Hey Sanxing, thanks for the great questions!

  1. The query match is designed to better determine product type match, as even products within the same fine-grained query category (i.e. products obtained by searching the same query in amazon.com) could have different product types, e.g. when you search "phone" in amazon.com you might get both iphone or iphone case. We did human checks to make sure the current product type reward function is more faithful.
  2. We split train/dev/test instances by randomly shuffing all products, so they should be i.i.d. I just checked the instructions, and seems 402/500 test instructions and 795/1000 dev instructions have underlying options. Let me know if there is anything else wrong.
    import json
    a = json.load(open('items_human_ins.json'))
    lens = [ a[list(a.keys())[j]][0]['instruction_options'] for j in range(len(a))]  # extract all instruction options
    lens = [len(_) > 0 for _ in lens]  # whether has instruction option or not
    print(sum(lens[:500]), sum(lens[500:1500])) # 402 795
  3. Thank you for raising this point! It does seem that 52.6% is too low a success rate for choice oracle. I will look into it and correct anything wrong for paper revision, thanks again!
sanxing-chen commented 1 year ago

Hey @ysymyth, thanks for your information!

  1. The design of query match makes sense to me, just to confirm as I didn't find it in paper.
  2. I checked again, my previous calculation of dev set options was somehow incorrect. It does contain similar portion of samples with options compared to the others. (I think your snippet might also be wrong since the order of items_human_ins.json is not the same as human_goals.json which is used to determine the splits in train_choice_il.py. But it doesn't matter though)
  3. Look forward to the updates on this!
ysymyth commented 1 year ago

Hi @sanxing-chen , I just checked the choice oracle with searching the instruction text, and it has a success rate of 0.854, which is close to your 0.868 (there could be some variance coming from the price sampling).

In the paper we may have implemented a wrong algorithm, which only enumerated choosing every SINGLE option (which led to 0.636 success rate in my re-implementation), instead of every COMBINATION of options. We will revise the choice oracle part of the paper accordingly, thanks again!

I'll close it now and feel free to open for new questions!