The paper reports correspondence results between text and generated images. However, it is unclear how these alignment scores of BIQA methods are calculated. Does these BIQA methods only use the input image to predict correspondence, without considering the text?
The paper reports correspondence results between text and generated images. However, it is unclear how these alignment scores of BIQA methods are calculated. Does these BIQA methods only use the input image to predict correspondence, without considering the text?