I'm curious what overhead we pay calling addDocument for N documents, versus indexing all N docs in a single addDocuments call. IW has non-trivial entry / exit costs (checking out / locking the DWPT, checking flush triggers, locking to free the DWPT, etc.).
One simple way to test this would be to modify our existing Indexer.java when reading from a binary line file docs to index each block with a single addDocuments call.
[Spinoff from https://github.com/apache/lucene/pull/12829#issuecomment-1855755782]
I'm curious what overhead we pay calling
addDocument
for N documents, versus indexing all N docs in a singleaddDocuments
call. IW has non-trivial entry / exit costs (checking out / locking the DWPT, checking flush triggers, locking to free the DWPT, etc.).One simple way to test this would be to modify our existing
Indexer.java
when reading from a binary line file docs to index each block with a singleaddDocuments
call.