princeton-nlp / WebShop

[NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
https://webshop-pnlp.github.io
MIT License
255 stars 53 forks source link

Reward Function #8

Closed john-b-yang closed 1 year ago

john-b-yang commented 2 years ago

Overview

This PR improves the reward function, with changes that are meant to make it more faithful to how human task workers would perform and evaluate the WebShop task.

Description of Changes

The prior WebShop reward function uses exact matching to evaluate whether a purchased product satisfies the goal instruction's ask. Specifically, attributes and options are matched w/ direct string comparison. This approach led to consistent over-penalization of human and agent performance.

This PR introduces changes to capture the following two similarities:

To capture lexical similarity, the fuzz library is used, and pairs of values w/ a match ratio > 85 are rewarded.

To capture synonyms, the PyMultiDictionary (link) is used.