salaniz / pycocoevalcap

Python 3 support for the MS COCO caption evaluation tools
Other
292 stars 82 forks source link

Example of using pycocoevalcap WITHOUT coco data #21

Open dinhanhx opened 9 months ago

dinhanhx commented 9 months ago

No, this is not an issue. It's an example for whoever trying to use this package with their own data.

from pycocoevalcap.tokenizer.ptbtokenizer import PTBTokenizer
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge
from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.spice.spice import Spice

class Evaluator:
    def __init__(self) -> None:
        self.tokenizer = PTBTokenizer()
        self.scorer_list = [
            (Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
            (Meteor(), "METEOR"),
            (Rouge(), "ROUGE_L"),
            (Cider(), "CIDEr"),
            (Spice(), "SPICE"),
        ]
        self.evaluation_report = {}

    def do_the_thing(self, golden_reference, candidate_reference):
        golden_reference = self.tokenizer.tokenize(golden_reference)
        candidate_reference = self.tokenizer.tokenize(candidate_reference)

        # From this point, some variables are named as in the original code
        # I have no idea why they name like these
        # The original code: https://github.com/salaniz/pycocoevalcap/blob/a24f74c408c918f1f4ec34e9514bc8a76ce41ffd/eval.py#L51-L63
        for scorer, method in self.scorer_list:
            score, scores = scorer.compute_score(golden_reference, candidate_reference)
            if isinstance(method, list):
                for sc, scs, m in zip(score, scores, method):
                    self.evaluation_report[m] = sc
            else:
                self.evaluation_report[method] = score

golden_reference = [
    "The quick brown fox jumps over the lazy dog.",
    "The brown fox quickly jumps over the lazy dog.",
    "A sly brown fox jumps over the lethargic dog.",
    "The speedy brown fox leaps over the sleepy hound.",
    "A fast, brown fox jumps over the lazy dog.",
]
golden_reference = {k: [{'caption': v}] for k, v in enumerate(golden_reference)}

candidate_reference = [
    "A fast brown fox leaps above the tired dog.",
    "A quick brown fox jumps over the sleepy dog.",
    "The fast brown fox jumps over the lazy dog.",
    "The brown fox jumps swiftly over the lazy dog.",
    "A speedy brown fox leaps over the drowsy dog.",
]
candidate_reference = {k: [{'caption': v}] for k, v in enumerate(candidate_reference)}

evaluator = Evaluator()

evaluator.do_the_thing(golden_reference, candidate_reference)

print(evaluator.evaluation_report)
salaniz commented 5 months ago

Thank you for providing this code example!

If this package is still actively being used by the community as opposed to newer alternatives, we could think about extending the original API to make integration easier for these use cases.

I suppose your suggestion would be the first step. I wonder if it would also make sense to update the metric evaluation libraries as they currently pinned to old releases.

Let me know what you (and maybe others) think.

dinhanhx commented 5 months ago

@salaniz This package is still used widely. However, people just copy and modify (like I do) for their own needs.

There are indeed newer alternatives such as HuggingFace Evaluation lib or NLTK metrics.

Extending and refactoring API is possible. The tough task is to ship Java stuff. Not many people want to have Java installed in their system. For anyone want to work on this package, please just don't use Java.

Also feel free to use my code.

dinhanhx commented 5 months ago

For anyone who want to work PTB Tokenizer, here are some resources: