pickxiguapi / Clean-Offline-RLHF

Offline RLHF codebase implementation for "Uni-RLHF: Universal Platform and Benchmark Suite for Reinforcement Learning with Diverse Human Feedback" (ICLR2024)
https://uni-rlhf.github.io/
MIT License
31 stars 2 forks source link

Missing code and data for certain feedback types #3

Closed thomas475 closed 4 months ago

thomas475 commented 4 months ago

Dear authors,

thank you for the outstanding work on this project!

In your paper, you mention that the system was evaluated using comparative, attribute, and keypoint feedback. However, it seems that the labeled data for attribute and keypoint feedback is missing from the repository. Additionally, the code for handling these types of feedback for reward learning isn't included either.

Could you please provide the missing data and code? It would also be great if you could publish the code for evaluative and visual feedback, if available. This would be incredibly helpful for my project, where I am working with your system on the integration of multiple feedback types.

Many thanks!

pickxiguapi commented 4 months ago

Hi thomas,

Sorry for the confusing! In Uni-RLHF, we collected a total of three types of feedback labels, with the majority being Comparative and Attribute feedback labels.

The Comparative feedback labels have already been provided in this repository. The Attribute feedback labels were only used for basic experiments in the Uni-RLHF paper and were later expanded into the complete paper AlignDiff. The baseline algorithm named TD3BC+Pref in AlignDiff is the same as in the Uni-RLHF attribute experiments. So all the Attribute source labels and processing procedure have been open-sourced here.

As for Keypoint feedback, we regret that we only collected a small portion of the labels for preliminary experiments in the appendix. We are currently expanding this into a formal engineering project, and all labels will be open-sourced at that time.

Thank you for your attention. We look forward to working together with the community to improve this.

pickxiguapi commented 4 months ago

I will close this issue and pin it to the main page until I have time to update the readme.md, feel free to reopen it if you have more question!