Closed JoaoLages closed 6 years ago
Hey!
@khui can explain further, but I believe these are matrices from documents that turned out to be empty after pre-processing. Similarly, I'm guessing qid 95 is missing because that qid is not present in the qrels.
Hi @JoaoLages, Thanks for asking. Sorry for this late reply. But @andrewyates already makes the point.
As far as I could recall, the missing matrics could be due to:
1) there is no judgment in qrel, like 95 2) the extraction of the content of the document from clueweb is failed, which could due to the ill layout or content record problem
Hi there!
I grouped up the similarity matrices that are missing in your download link. Could you provide them please? There are around 650 missing, it's a small percentage compared to the ~115k total. Also, the 95th query_idf vector is missing as well.
Edit: These matrices use the query description. Edit2: The folder
cosine/desc_doc_mat/269
has only empty matrices - I think this might be related to the fact that the topic 269 has no description