joeyz0z / ConZIC

Official implementation of "ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing"
MIT License
73 stars 17 forks source link

The metrics in the paper #2

Open TIan1874 opened 1 year ago

TIan1874 commented 1 year ago

Thank you very much for your excellent work! I want to reproduce the work, and I am wondering if you could make public the code for calculating the metrics in the paper, such as BLEU-4, METEOR, CIDEr, SPICE, and RefCLIPScore. I would appreciate it if you could take the time to reply to me.

joeyz0z commented 1 year ago

We use the evaluation toolkits from https://github.com/jmhessel/clipscore to compute the metrics.

TIan1874 commented 1 year ago

We use the evaluation toolkits from https://github.com/jmhessel/clipscore to compute the metrics.

Thanks very much!

baiyuting commented 1 year ago

I also see the diversity metrics, Vocab, S-C, Div-1, Div-2 are calculated in the paper, is there any related code for diversity metric evaluation that could be provided? @joeyz0z

joeyz0z commented 1 year ago

Diversity metrics are based on https://github.com/qingzwang/DiversityMetrics

Lxb-Code-Dev commented 1 year ago

About diversity metric div-1, div-2: I followed the metrics calculation link "Diversity metrics are based on https://github.com/qingzwang/DiversityMetrics" that you guys put on the GitHub issue, but I would like to know how to calculate n-gram diversity metrics based on that link since I didn't find a way to calculate div-2 among the links you provided.

joeyz0z commented 1 year ago

Here is the code: https://github.com/joeyz0z/ConZIC/blob/main/compute_n_div.py

Lxb-Code-Dev commented 1 year ago

I want to know if the prompt is reserved when calculating evaluation metrics. Thank you very much for your excellent work!

joeyz0z commented 1 year ago

yes, we reserved the prompt

Lxb-Code-Dev commented 1 year ago

Thank you very much for your reply, I am following your work! I have a question about the vocab metric, are there 25,000 (5,000 images, 5 captions per image) captions involved in the calculation of this metric? If so, are stop-words other than stop_words.txt being introduced?