khui / copacrr

The code for COPACRR Neural IR model.
Apache License 2.0
38 stars 11 forks source link

Missing similarity matrices #5

Closed JoaoLages closed 6 years ago

JoaoLages commented 6 years ago

Hi there!

I grouped up the similarity matrices that are missing in your download link. Could you provide them please? There are around 650 missing, it's a small percentage compared to the ~115k total. Also, the 95th query_idf vector is missing as well.

Edit: These matrices use the query description. Edit2: The folder cosine/desc_doc_mat/269 has only empty matrices - I think this might be related to the fact that the topic 269 has no description

andrewyates commented 6 years ago

Hey!

@khui can explain further, but I believe these are matrices from documents that turned out to be empty after pre-processing. Similarly, I'm guessing qid 95 is missing because that qid is not present in the qrels.

khui commented 6 years ago

Hi @JoaoLages, Thanks for asking. Sorry for this late reply. But @andrewyates already makes the point.

As far as I could recall, the missing matrics could be due to:

1) there is no judgment in qrel, like 95 2) the extraction of the content of the document from clueweb is failed, which could due to the ill layout or content record problem