luyug / COIL

NAACL2021 - COIL Contextualized Lexical Retriever
Apache License 2.0
148 stars 28 forks source link

TSV's aren't eval compliant #9

Closed JMMackenzie closed 3 years ago

JMMackenzie commented 3 years ago

Just wanted to ping you to let you know that the hosted TSV files (at least the dev ones) for COIL aren't in the correct format for evaluation.

EG:

1048585 7187158 35.926036089658744
1048585 7187160 35.790479123592384
1048585 7187155 35.65535098314285
1048585 7187157 34.09628629684448
1048585 7617404 33.498324900865555
1048585 3856131 31.57883720099926
1048585 7617413 31.314840689301487
1048585 7187156 31.123393774032593
1048585 7617411 30.926150113344196
1048585 353739  30.901350378990173

I would expect:

1048585 7187158 1
1048585 7187160 2
1048585 7187155 3
1048585 7187157 4
1048585 7617404 5
1048585 3856131 6
1048585 7617413 7
1048585 7187156 8
1048585 7617411 9
1048585 353739  10

Clearly it's no real problem as it's easy to fix locally, but I'm not sure if this was intended or not.

luyug commented 3 years ago

The former is preferred as a canonical form. It can be converted into various formats, including ms-marco (using the conversion script described here), but not necessarily the other way around.

JMMackenzie commented 3 years ago

Thanks for clarifying, that makes sense indeed. I'll close this one off :-)