Open cyy-1234 opened 1 year ago
The prompt can contain "style words", the model is capable of identifying these words, but our benchmark avoids using prompts with them to avoid potential bias. outputs["image_features"], outputs["text_features"] are normalized features of an image and a prompt respectively, and our model is trained to give higher similarity for images that are better preferred by human evaluators. outputs["logit_scale"] is no longer there in our code, you can try pulling our latest commits. hps_score[0] can be used to compare images generated with the same prompt, but not for comparing images with different prompts.
In addition, it can be seen from the paper that "computes the similarity between prompt p and image x", so this final score is the similarity between prompt p and image X ", how are the errors in the image calculated, and how does the "rank" field in the instance function.
May I ask if the latest best score is the checkpoint used on July 26?
Hello dear author, think about it, you have a classification of training data, such as people, animals, cars, food and other categories.
The "compressed checkpoint" is the checkpoint of preference prediction model "HPS v2". Compared with the last checkpoint, it just excludes some unnecessary data.
May I ask if the latest best score is the checkpoint used on July 26?
For the test of custom data, is the prompt word only contains the content, can not appear about the style of the word? And I want to know what outputs["image_features"], outputs["text_features"], outputs["logit_scale"] mean in score.py, And finally what it means to use hps_score[0] as a score,