llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF
https://llava-rlhf.github.io/
GNU General Public License v3.0
323 stars 25 forks source link

llava_ppo50k-aokvqa12k-vqa10k.json.数据怎么制作的呢? #34

Closed Spring24ch closed 2 months ago

Spring24ch commented 3 months ago

{ "id": "000000326350", "image": "000000326350.jpg", "conversations": [ { "from": "human", "value": "\nWhat could be the reason behind the meeting between baseball players and these people wearing suits?" }, { "from": "gpt", "value": "The meeting between the baseball players and the people wearing suits on the baseball field could suggest a formal event, such as a contract signing, a partnership, or an endorsement agreement. Alternatively, it could also be an important match or a charity event where officials from a sponsoring organization or league representatives are present. The handshakes and greetings observed in the image indicate a friendly and collaborative atmosphere, which supports these scenarios. The event is significant enough to be filmed, suggesting that it may be noteworthy or relevant to the broader public or fans of the sport." } ], "captions": [ "baseball players and managers meet and shake hands", "a group of men extend hands to shake.", "baseball players and officials shaking hands while being filmed.", "baseball players meet with people wearing suits on a basefield", "baseball players greeting people on a baseball field." ], "caption_type": 1, "length_bonus": -0.5 } 其中的length_bonus大小怎么判断的呢

Edward-Sun commented 2 months ago

It's a black magic :) Tuned for each subset in training tasks: Conversation, Detail description, and Complex reasoning It should be consistent across the PPO training examples from the same subset.