This PR improves the reward function, with changes that are meant to make it more faithful to how human task workers would perform and evaluate the WebShop task.
Description of Changes
The prior WebShop reward function uses exact matching to evaluate whether a purchased product satisfies the goal instruction's ask. Specifically, attributes and options are matched w/ direct string comparison. This approach led to consistent over-penalization of human and agent performance.
This PR introduces changes to capture the following two similarities:
Lexical Similarity: Variations on the same word (i.e. "wash" and "washable")
Synonyms: (i.e. "blue" and "aquamarine")
To capture lexical similarity, the fuzz library is used, and pairs of values w/ a match ratio > 85 are rewarded.
To capture synonyms, the PyMultiDictionary (link) is used.
Overview
This PR improves the reward function, with changes that are meant to make it more faithful to how human task workers would perform and evaluate the WebShop task.
Description of Changes
The prior WebShop reward function uses exact matching to evaluate whether a purchased product satisfies the goal instruction's ask. Specifically, attributes and options are matched w/ direct string comparison. This approach led to consistent over-penalization of human and agent performance.
This PR introduces changes to capture the following two similarities:
To capture lexical similarity, the
fuzz
library is used, and pairs of values w/ a match ratio > 85 are rewarded.To capture synonyms, the
PyMultiDictionary
(link) is used.