tgxs002 / HPSv2

Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Apache License 2.0
403 stars 12 forks source link

For the test of custom data #9

Open cyy-1234 opened 1 year ago

cyy-1234 commented 1 year ago

For the test of custom data, is the prompt word only contains the content, can not appear about the style of the word? And I want to know what outputs["image_features"], outputs["text_features"], outputs["logit_scale"] mean in score.py, And finally what it means to use hps_score[0] as a score, image

tgxs002 commented 1 year ago

The prompt can contain "style words", the model is capable of identifying these words, but our benchmark avoids using prompts with them to avoid potential bias. outputs["image_features"], outputs["text_features"] are normalized features of an image and a prompt respectively, and our model is trained to give higher similarity for images that are better preferred by human evaluators. outputs["logit_scale"] is no longer there in our code, you can try pulling our latest commits. hps_score[0] can be used to compare images generated with the same prompt, but not for comparing images with different prompts.

cyy-1234 commented 1 year ago

In addition, it can be seen from the paper that "computes the similarity between prompt p and image x", so this final score is the similarity between prompt p and image X ", how are the errors in the image calculated, and how does the "rank" field in the instance function. image

tgxs002 commented 1 year ago

You can refer to our paper and our repo for detailed explanation.

cyy-1234 commented 1 year ago

image May I ask if the latest best score is the checkpoint used on July 26?

cyy-1234 commented 1 year ago

Hello dear author, think about it, you have a classification of training data, such as people, animals, cars, food and other categories.

w-zhih commented 1 year ago

The "compressed checkpoint" is the checkpoint of preference prediction model "HPS v2". Compared with the last checkpoint, it just excludes some unnecessary data.

image May I ask if the latest best score is the checkpoint used on July 26?