Looks like benchmark was changed in a way it probably supports /home/paul/git/search-index-benchmark-game/corpus.json whatever format that is, but no longer supports wikipedia's articles. For example using lucene-8.0.0 engine and attempt to index reveals:
$ make idx
---- Indexing Lucene ----
java -server -cp build/libs/search-index-benchmark-game-lucene-1.0-SNAPSHOT-all.jar BuildIndex idx < /ssd/karel/vcs/search-benchmark-game/wiki-articles.json
Exception in thread "main" java.lang.NullPointerException
at BuildIndex.main(BuildIndex.java:39)
Makefile:17: recipe for target 'idx' failed
make: *** [idx] Error 1
which means parse error or better can't get id from the json line. The problem is in wikipedia articles there is no id, but rather url, title and body.
Very similar result is obtained also while testing tantivy-0.9 engine:
$ make index
---- Indexing tantivy ----
export RUST_LOG=info && target/release/build_index "idx" < /ssd/karel/vcs/search-benchmark-game/wiki-articles.json
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: InvalidArgument("Failed to parse document NoSuchFieldInSchema(\"body\")")', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Makefile:19: recipe for target 'idx' failed
make: *** [idx] Error 101
again, the code expects just id and text json text fields...
Looks like benchmark was changed in a way it probably supports
/home/paul/git/search-index-benchmark-game/corpus.json
whatever format that is, but no longer supports wikipedia's articles. For example using lucene-8.0.0 engine and attempt to index reveals:which means parse error or better can't get
id
from the json line. The problem is in wikipedia articles there is noid
, but ratherurl
,title
andbody
.Very similar result is obtained also while testing tantivy-0.9 engine:
again, the code expects just
id
andtext
json text fields...