texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
487 stars 92 forks source link

ms marco passage ranking example raises error #12

Closed ArvinZhuang closed 1 year ago

ArvinZhuang commented 2 years ago

I'm trying to reproduce this example: https://github.com/texttron/tevatron/tree/main/examples/msmarco-passage-ranking

Training and encoding are all good, however, retrieval with faiss raises an error:

  File "..../tevatron/src/tevatron/faiss_retriever/reducer.py", line 18, in combine_faiss_results
    rh.add_result(-scores, indices)
  File "..../python3.7/site-packages/faiss/__init__.py", line 1622, in add_result
    swig_ptr(I), self.k)
  File "..../python3.7/site-packages/faiss/swigfaiss.py", line 5700, in swig_ptr
    return _swigfaiss.swig_ptr(a)
ValueError: did not recognize array type

I think the reason for this is the indices are actually numpy arrays with string ids but faiss wants int64 ids.

Seems is this update break the code? @MXueguang

-    psg_indices = [[int(p_lookup[x]) for x in q_dd] for q_dd in all_indices]
+   psg_indices = [[str(p_lookup[x]) for x in q_dd] for q_dd in all_indices]

One lazy fix for this: indices = indices.astype(np.int64)

I'm using faiss-cpu==1.7.1

ArvinZhuang commented 2 years ago

The result I got for this example:

##################### MRR @10: 0.3164818983945045 QueriesRanked: 6980 #####################

MXueguang commented 2 years ago

Hi @ArvinZhuang, thanks for sharing the results. Could you create a PR for this? i.e. the fix in reducer.py and also the results in the doc?