microsoft / TREC-2019-Deep-Learning

Website for the TREC Deep Learning Track 2019
https://microsoft.github.io/TREC-2019-Deep-Learning/
Creative Commons Attribution 4.0 International
87 stars 28 forks source link

Inconsistent Encoding between triples tsv and collection/queries tsv? #9

Closed tuzhucheng closed 5 years ago

tuzhucheng commented 5 years ago

It seems that some lines between the queries.train.tsv and triples.train.small.tsv are encoded in different ways in the passage re-ranking dataset?

Here is example:

$ grep "what are dragon pok" triples.train.small.tsv | head -1
what are dragon pokémon weak against?  What are dragon Pokemon weak against in all Pokemon games? Dragon Pokemon are weak to ice and dragon type moves, and resistant to electric, fire, grass, and water type moves (By resistant I mean that when those type moves hit a dragon… Pokemon, they will only do half the damage that they normally would). (This answer only applies to pure dragon Pokemon. Emman Jr.       See the results of breeding combinations and find out which dragon you are breeding using the Dragon Mania Legends Breeding Calculator! Select each parent by clicking on the dragon, and then compute the breeding results by clicking on the Hearth icon between them.ee the results of breeding combinations and find out which dragon you are breeding using the Dragon Mania Legends Breeding Calculator! Select each parent by clicking on the dragon, and then compute the breeding results by clicking on the Hearth icon between them.

$ grep "what are dragon pok" collectionandqueries/queries.train.tsv
557935  what are dragon pokémon weak against?

In the triples file Pokemon is "pokémon" but in the queries tsv it is "pokémon".