Closed meszlili96 closed 4 years ago
Pyserini does not work well for our project, so I tried using Chris's project. I ran into an error when running the application. Something is wrong with the indexing but Pyserini and Anserini seem to be able to work with it. I used exactly the same indexing as in Pyserini and the Anserini demo uses that too. The error is the following:
Exception in thread "main" java.lang.RuntimeException: There should be only one leaf, index the collection using one writer
at nl.ru.convert.Convert.<init>(Convert.java:45)
at nl.ru.convert.Convert.main(Convert.java:192)
This originates from the following line, which is exactly the same as in the Anserini code:
reader = DirectoryReader.open(FSDirectory.open(indexPath));
Pyserini uses the SimpleSearcher class from the Anserini project: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/SimpleSearcher.java
In the Anserini demo SearchCollection is used: https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/SearchCollection.java
Chris's code: https://github.com/Chriskamphuis/olddog/blob/master/src/main/java/nl/ru/convert/Convert.java
The dict
, docs
, terms
, and qrels
tables can now be downloaded from here. These are based upon the index with one leaf, so most problems should now be fixed.
Extract the data from the Lucene indexes and build the tables.