Closed Smadx closed 2 months ago
Taking the average pickscore score on a set of predefined prompts is a common practice. You might want do it on some baseline model as well to draw a meaningful conclusion on "is this a good score?". This is true though for any metric.
Thanks for your answer
How should I use PickScore to evaluate the performance of a model? The example given involves providing a set of probability distributions for a list of images. When evaluating a model, do I need to reference another model? I mean, should I record the images generated by two models each time, then calculate the probabilities, and finally tally the average selection probability for these two models? Or I don't need another model, and instead just let one model generate many images, then tally the average score of these images?