xlang-ai / text2reward

[ICLR 2024] Code for the paper "Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning"
https://text-to-reward.github.io/
114 stars 6 forks source link

How does this differ from Nvidia's Eureka generative rewards pipeline? #2

Closed bdiu29 closed 8 months ago

bdiu29 commented 8 months ago

I read through the paper, but it doesn't seem immediately obvious to me what the differences are between text2reward and Eureka - https://eureka-research.github.io/

text2reward does seem a bit more generalizable to custom RL environments though.

Thanks!

Timothyxxx commented 8 months ago

Ours (https://arxiv.org/abs/2309.11489) and Eureka (https://arxiv.org/abs/2310.12931) are essentially the same in concept, but were tested on different downstream tasks.

Eureka represents an important concurrent development in the field, we all address the important problem of dense reward specification in reinforcement learning, using a method based on zero-shot and few-shot prompting of a language model. We believe that our work, alongside Eureka, contributes to the broader understanding and capabilities of LLMs in policy training and reinforcement learning.

bdiu29 commented 8 months ago

Awesome! I appreciate you taking the time to respond. The few-shot prompting was also the main difference I saw in my initial observation. Eureka seems to be able to shape a reward function without providing a template.