Metric - Githubissues

wtc9806 commented 4 months ago

Hi, authors I found that the metric code is not complete. Can you provide the complete test code?

zzjchen commented 4 months ago

Thanks for the issue. It seems ROUGE-L (with beta=5) is missing. We will add ROUGE-L evaluation to this repository soon.

wtc9806 commented 4 months ago

I used the img_align method in Metrics.py to calculate the similarity between the generated image and the real image, but the index I got was 1.28, which is about twice the performance of the PR+ICL configuration in the original paper. Is there something wrong with what I wrote? My test code is as follows: def main(args): model, processor = clip.load("ViT-L/14", device=args.device) result_lst = os.listdir(args.data_path) img_align_score = [] pms_score = [] rouge_l_score = [] error_lst = []

for item in tqdm(result_lst):
    path = os.path.join(args.data_path, item)
    ori_path = os.path.join(path, 'ori.png')
    gen_path = os.path.join(path, 'generated.png')
    text_path = os.path.join(path, 'result.txt')
    if not os.path.exists(ori_path) or not os.path.exists(gen_path):
        error_path = ori_path + ' ' + gen_path
        error_lst.append(error_path)
        continue
    else:
        ori_img = read_img(ori_path)
        gen_img = read_img(gen_path)
        text = read_txt(text_path)

        img_align_score.append(img_align(model, processor, ori_img, gen_img, device=args.device))
        pms_score.append(clip_score(model, processor, gen_img, text['new_prompt'], device=args.device))

mean_align = sum(img_align_score) / len(img_align_score)
print('mean_align = ',mean_align)

with open(args.error_path, "w") as f:
    f.writelines(error_lst)

zzjchen commented 4 months ago

For image metrics, there are two differences in your code, one of which is due to a problem in my code in "metrics.py". First, the difference is mainly because of a wrong coefficient in 'img_align' function: the similarity should not be multiplied by 2.5. It should look like this, and we'll fix this soon.

@torch.no_grad()
def img_align(model,preprocess,ori_image,gen_image,device='cuda'):
    '''
    Implementation of Image-Align metric proposed, which calculates the similarity
    between ground truth image and generated image.
    Image-Align uses CLIP ViT-B/32 model

    Args:
        model: CLIP model
        preprocess: CLIP image preprocess
        ori_image: ground truth image
        gem_image: generated image
        device: specify if not 'cuda'
    '''
    ori_image = preprocess(ori_image).unsqueeze(0).to(device)
    gen_image = preprocess(gen_image).unsqueeze(0).to(device)
    ori_features = model.encode_image(ori_image)
    gen_features = model.encode_image(gen_image)
    gen_features /= gen_features.norm(dim=-1, keepdim=True)
    ori_features /= ori_features.norm(dim=-1, keepdim=True)
    score=1.0*(ori_features @ gen_features.T)
    if score<0:
        score=0.0
    return score

You are using CLIP-VIT-L/14 as the evaluation model, while we use CLIP-VIT-B/32 throughout our evaluation. Generally this might be OK as long as you use the same model for all evaluations, and should only result in slight differences.

Thanks again for your issue. We will fix the 'img_align' function and add the modified ROUGE-L calculation in 'metrics.py' soon.

zzjchen commented 4 months ago

"metrics.py" is updated.

zzjchen / Tailored-Visions

Metric #2