Closed bdiu29 closed 8 months ago
Ours (https://arxiv.org/abs/2309.11489) and Eureka (https://arxiv.org/abs/2310.12931) are essentially the same in concept, but were tested on different downstream tasks.
Eureka represents an important concurrent development in the field, we all address the important problem of dense reward specification in reinforcement learning, using a method based on zero-shot and few-shot prompting of a language model. We believe that our work, alongside Eureka, contributes to the broader understanding and capabilities of LLMs in policy training and reinforcement learning.
Awesome! I appreciate you taking the time to respond. The few-shot prompting was also the main difference I saw in my initial observation. Eureka seems to be able to shape a reward function without providing a template.
I read through the paper, but it doesn't seem immediately obvious to me what the differences are between text2reward and Eureka - https://eureka-research.github.io/
text2reward does seem a bit more generalizable to custom RL environments though.
Thanks!