mikemccand / luceneutil

Various utility scripts for running Lucene performance tests
Apache License 2.0
205 stars 115 forks source link

Make ID assignment deterministic #187

Closed zhaih closed 2 years ago

zhaih commented 2 years ago

Changes

  1. Changes the script that build the binary LFDs and added ID to each line
  2. Changes the way of assign IDs in LineFileDocs.java
mikemccand commented 2 years ago

Aha! Now I see #186! EventuallyConsistentMikeException.

You're right -- using AtomicInteger means the id assignment is non-deterministic!

But couldn't we still make it deterministic, in the binary case, by knowing the idBase of each block, and then each thread indexes that block by incrementing its id locally?

And in the text case, I think we hold a lock while reading the file, and we could do a simple id++ (perhaps on a volatile int, though I think because the same lock is held by each thread that does the increment, we may not need the volatile) there?

zhaih commented 2 years ago

Thanks @zhaih!

Do we think this might mean we can turn back on the rearrange step and use multiple threads to build the "for deterministic searching" index?

Yes I hope this is enough. I'll try to revive some of the memory and do the experiment locally and let you know what happened over the weekend!