The source of meteor-1.5.jar

Hello,

I was using the same JAR file in one of my projects since a couple of years. Today I discovered that this JAR file's -stdio mode behaves quite differently from the below ones:

Original v1.5 tarball at http://www.cs.cmu.edu/~alavie/METEOR/download/meteor-1.5.tar.gz
A fresh build from https://github.com/cmu-mtlab/meteor
The one shipped within multeval : https://github.com/jhclark/multeval

It took me whole my day to understand what is going on and the same difference seems to be the cause of many commented/uncommented code sections in several Python wrappers.

So do you remember if you modified the source code of the original METEOR code before building the JAR that you are shipping? EDIT: @endernewton apparently the code was modified according to your comment. Would that be possible to share the modification then?

Let me try to clarify the behavior difference as well:

In -stdio mode, we provide the METEOR binary a list of lines in the following format:

SCORE ||| ref words ||| hyp words

and for each segment we obtain a set of stats in return. For this part, all binaries produce exactly the same output.

According to official documentation we now have to provide the so-called EVAL lines. To be honest, the official documentation is ambiguous about this aspect, i.e. it's not clear whether an EVAL line per segment (Method1) or an EVAL line per all segments (Method2) should be provided.

Your Python wrapper does the latter (Method2): EVAL ||| stats_1 ||| stats_2 ||| .... ||| stats_N\n Other wrappers does the former Method1:

EVAL ||| stats returned for segment 1
EVAL ||| stats returned for segment 2
...
EVAL ||| stats returned for segment N

And your modified version produces first the segment scores and then a final score which seems to be the actual METEOR score. This score is not equal to the mean of segment scores because of fragmentation penalty. In this case, that final precious line of your modified JAR matches the score that you get if you run METEOR in non-stdio mode. Good.

While for the original code, Method2 does not even work, produces only the segment score for the first segment and it stops since it can not parse the rest of it probably waiting for \n. Method1 works, but it does not produce that final score line that you probably added to your branch. So all other wrappers around github using the original code takes the mean of the segment scores and that score is not penalized i.e. not comparable to cocoeval tools. Unpenalized meteor is ~1 better than the actual METEOR score in one of my German test cases.

In short, I think you need to clarify these points in README.md and also, if possible provide the sources of your modified JAR. I never imagined that you would be shipping a modified JAR.

UPDATE: You also mentioned shortly about this in the README actually, did not check it until I discovered the issue. But since the modification changes the stdio API, I think it deserves a little bit more explanation.

Thank you!

tylin / coco-caption

The source of meteor-1.5.jar #34