mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
203 stars 114 forks source link

Test performance impact of `addDocuments` vs `addDocument` #252

Open mikemccand opened 11 months ago

mikemccand commented 11 months ago

[Spinoff from https://github.com/apache/lucene/pull/12829#issuecomment-1855755782]

I'm curious what overhead we pay calling addDocument for N documents, versus indexing all N docs in a single addDocuments call. IW has non-trivial entry / exit costs (checking out / locking the DWPT, checking flush triggers, locking to free the DWPT, etc.).

One simple way to test this would be to modify our existing Indexer.java when reading from a binary line file docs to index each block with a single addDocuments call.

jpountz commented 11 months ago

We could use IndexGeonames, which has a batchAddDocuments boolean value aimed at checking exactly this.