Overview

This PR improves the reward function, with changes that are meant to make it more faithful to how human task workers would perform and evaluate the WebShop task.

Description of Changes

The prior WebShop reward function uses exact matching to evaluate whether a purchased product satisfies the goal instruction's ask. Specifically, attributes and options are matched w/ direct string comparison. This approach led to consistent over-penalization of human and agent performance.

This PR introduces changes to capture the following two similarities:

Lexical Similarity: Variations on the same word (i.e. "wash" and "washable")
Synonyms: (i.e. "blue" and "aquamarine")

To capture lexical similarity, the fuzz library is used, and pairs of values w/ a match ratio > 85 are rewarded.

To capture synonyms, the PyMultiDictionary (link) is used.

princeton-nlp / WebShop

Reward Function #8

Overview

Description of Changes