Reward function, data distribution, and the choice oracle

sanxing-chen commented 1 year ago

I have some questions and hope you can help me answer

In reward computation, a query_match feature was considered, which doesn’t seem to be either the 5 course-grain product categories nor the chained fine-grain categories. Is it something else?
We find that 8248/10587 of training instances and 387/500 of test instances have options, while only 147/1000 of the development set instances have options, which doesn’t seem to be i.i.d. We wonder if there’s a specific reason that leads to this difference.
We’re trying to reproduce the "Results with a Choice oracle”, we follow the paper to use instruction text as search query to get 50 search results and try all the items and option combinations to find the item with highest reward when comparing against the human goal. In test set, we get a 86.8% success rate which is much different than the 52.6% reported. We directly call the reward function in a way similar to the snippet attached below.

Please let me know if any of these questions are unclear, thanks!

all_products, product_item_dict, product_prices, _ = load_products(filepath="data/items_shuffle.json", num_products=None, human_goals=True)
...
for j in range(0, 500):
    d = all_data[j]
    item_info = []
    search_result_items = d['all_item_details’] # search results from instructions
    for i in range(len(search_result_items)):
        item_asin = search_result_items[i][0].upper()
        product = product_item_dict[item_asin]
        all_options = list(product['options'].values())
        # generate all possible combinations of options
        all_combinations = list(itertools.product(*all_options))
        if len(all_combinations) > 0:
            maxr = -1
            maxinfo = None
            for c in all_combinations:
                reward, info = get_reward(product, human_goals[j], product_prices[item_asin], dict(enumerate(c)), verbose=True)
                if reward > maxr:
                    maxr = reward
                    maxinfo = info
                    if reward == 1.0:
                        break
            info = maxinfo
            reward = maxr
        else:
            reward, info = get_reward(product, human_goals[j], product_prices[item_asin], [], verbose=True)
        info['r_total'] = reward
        item_info.append(info)
    train_info.append(item_info)

ysymyth commented 1 year ago

Hey Sanxing, thanks for the great questions!

The query match is designed to better determine product type match, as even products within the same fine-grained query category (i.e. products obtained by searching the same query in amazon.com) could have different product types, e.g. when you search "phone" in amazon.com you might get both iphone or iphone case. We did human checks to make sure the current product type reward function is more faithful.

We split train/dev/test instances by randomly shuffing all products, so they should be i.i.d. I just checked the instructions, and seems 402/500 test instructions and 795/1000 dev instructions have underlying options. Let me know if there is anything else wrong.

import json
a = json.load(open('items_human_ins.json'))
lens = [ a[list(a.keys())[j]][0]['instruction_options'] for j in range(len(a))]  # extract all instruction options
lens = [len(_) > 0 for _ in lens]  # whether has instruction option or not
print(sum(lens[:500]), sum(lens[500:1500])) # 402 795

Thank you for raising this point! It does seem that 52.6% is too low a success rate for choice oracle. I will look into it and correct anything wrong for paper revision, thanks again!

sanxing-chen commented 1 year ago

Hey @ysymyth, thanks for your information!

The design of query match makes sense to me, just to confirm as I didn't find it in paper.
I checked again, my previous calculation of dev set options was somehow incorrect. It does contain similar portion of samples with options compared to the others. (I think your snippet might also be wrong since the order of items_human_ins.json is not the same as human_goals.json which is used to determine the splits in train_choice_il.py. But it doesn't matter though)
Look forward to the updates on this!

ysymyth commented 1 year ago

Hi @sanxing-chen , I just checked the choice oracle with searching the instruction text, and it has a success rate of 0.854, which is close to your 0.868 (there could be some variance coming from the price sampling).

In the paper we may have implemented a wrong algorithm, which only enumerated choosing every SINGLE option (which led to 0.636 success rate in my re-implementation), instead of every COMBINATION of options. We will revise the choice oracle part of the paper accordingly, thanks again!

I'll close it now and feel free to open for new questions!

princeton-nlp / WebShop

Reward function, data distribution, and the choice oracle #15