terrier-org / pyterrier

A Python framework for performing information retrieval experiments, building on http://terrier.org/
https://pyterrier.readthedocs.io/
Mozilla Public License 2.0
397 stars 63 forks source link

cant use aserini with pyterrier #404

Closed PreetJhanglani closed 9 months ago

PreetJhanglani commented 9 months ago

I tries using the example mentioned at https://pyterrier.readthedocs.io/en/latest/anserini.html#examples, I got the lucene index using pyserini but when I run the BM25_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="BM25") command it gives a error which ended with JavaException: JVM exception occurred: io/anserini/eval/Qrels java.lang.NoClassDefFoundError. I got the same error on colab and my mac m1.

cmacdonald commented 9 months ago

Hi @PreetJhanglani

There was a related report here: https://github.com/terrier-org/pyterrier/issues/396

In that report, I identified a need to de-conflict the Snowball stemmers between Lucene and Terrier. As of today, that's a work in progress.

There might be some hints in that #396 as to workaround your problems.

cmacdonald commented 9 months ago

Ok, I have this working in a branch of Pyterrier. Salient details are:

%pip install git+https://github.com/terrier-org/pyterrier.git@anserini22
%pip install pyserini==0.22.0 faiss-cpu

import pyterrier as pt
# use Anserini jar file that matches pyserini install
# version='snapshot' uses a jitpack version of current Terrier github, where snowball has been deconflicted from Lucene.. 
pt.init(boot_packages=["io.anserini:anserini:0.22.0:fatjar"], version='snapshot')

Example notebook at: https://colab.research.google.com/drive/1qzcO8O-cIh8aNtVmJ2Izgzni8h4UO4xV?usp=sharing

cmacdonald commented 9 months ago

Hi @PreetJhanglani I merged the changes to the master branch. This will be included in the next PyTerrier release. I will mark this issue as addressed.