mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Do we have the tools for benchmark on index build? #191

Closed tang-hi closed 2 years ago

tang-hi commented 2 years ago

The guide on the readme looks like it's only a search benchmark.I didn't found the benchmark tools for index build.If we have such tools, could you tell me which file? 😄

mikemccand commented 2 years ago

You are correct that the benchmark is search-focused.

But it does also report index size and indexing throughput / total time, and gives you control on whether to include the commit time, the "wait for final merges" time, and even whether to do periodic NRT refreshes during indexing.

If you execute a run you should see at least the index time, size, num segments reported. You can tune the ram buffer, number of indexing threads, etc.

Also, you should use the binary form of the LineFileDocs to minimize CPU cost of decoding the incoming documents and use every bit of CPU that you can to give to Lucene for doing the indexing.

tang-hi commented 2 years ago

You are correct that the benchmark is search-focused.

But it does also report index size and indexing throughput / total time, and gives you control on whether to include the commit time, the "wait for final merges" time, and even whether to do periodic NRT refreshes during indexing.

If you execute a run you should see at least the index time, size, num segments reported. You can tune the ram buffer, number of indexing threads, etc.

Also, you should use the binary form of the LineFileDocs to minimize CPU cost of decoding the incoming documents and use every bit of CPU that you can to give to Lucene for doing the indexing.

thanks, I got it.