mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

MergePolicy simulator utility [LUCENE-8331] #331

Open mikemccand opened 6 years ago

mikemccand commented 6 years ago

This issue introduces a MergePolicy simulator utility to help evaluate the effectiveness of a MergePolicy.  The simulator does not result in the actual indexing and merging of segments; instead it provides some dummy constructs to MergePolicy to evaluate its decisions. Therefore you can do simulation runs in little time.

I'm not sure where it would live. Perhaps dev-tools, or in tests, or in benchmark?

I mentioned this recently here: https://issues.apache.org/jira/browse/LUCENE-7976?focusedCommentId=16446985&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16446985


Legacy Jira details

LUCENE-8331 by David Smiley (@dsmiley) on May 24 2018, updated Jan 29 2020 Attachments: LUCENE-8331.patch Linked issues:

mikemccand commented 6 years ago

CC @mikemccand @s1monw @ErickErickson

I used this utility (with some other edits not in this patch) to evaluate a custom merge policy that had a notion of "cheap" merges. It turned out to be very successful; I may open other issues about ways TieredMergePolicy and/or the MergeScheduler can be improved.

The main features about this simulator are:

Some not so great parts:

What do you think guys?

[Legacy Jira: David Smiley (@dsmiley) on May 24 2018]

mikemccand commented 6 years ago

I thinks something like this can be helpful if you are working on a MP and/or trying to debug an issue. I am not sure it needs to be a commandline util. I would rather build the individual tools to plug stuff together as an API and put most of the utils like creating the simulated segments into the base tests class. I was going to do something similar to make testing simpler. I like the idea. LUCENE-8330 will help doing this as well

[Legacy Jira: Simon Willnauer (@s1monw) on May 25 2018]

mikemccand commented 6 years ago

Thanks for your input Simon.

I am not sure it needs to be a commandline util.

How else would something like this be executed? Maybe I don't understand your subsequent recommendation...

I would rather build the individual tools to plug stuff together as an API and put most of the utils like creating the simulated segments into the base tests class.

I may not be getting your point but I think you're saying you'd like Lucene's test infrastructure to have some of the elements of what this test does. Sounds good to me. Nevertheless the outcome of that would be less code in this simulator... but somewhere there needs to be a main() to literally run the simulation and setup whatever the simulated environment is, and code to track some stats of interest. Right?

Are you basically fine with me committing this?

[Legacy Jira: David Smiley (@dsmiley) on May 28 2018]

mikemccand commented 6 years ago

How else would something like this be executed? Maybe I don't understand your subsequent recommendation...\

can it just be a utility class that I call from a test or so I mean I am not sure how userfriendly it is to specify classpaths etc. I'd just run it from a test. I also think it's way more flexible if you have a java API to call rather than some cmd args you need to parse etc.

 

Are you basically fine with me committing this?\

I think it should support deletes and should not use IW then I ok with it

[Legacy Jira: Simon Willnauer (@s1monw) on May 29 2018]

mikemccand commented 6 years ago

bq. I think it should support deletes and should not use IW then I ok with it   +1

[Legacy Jira: Tommaso Teofili (@tteofili) on May 29 2018]

mikemccand commented 6 years ago

can it just be a utility class that I call from a test or so I mean I am not sure how userfriendly it is to specify classpaths etc. I'd just run it from a test.

Ooh, ok. FWIW what I do is simply right-click the main method and tell my IDE to run it. It fails the first go-round because it needs args so then I update the args. Since it's on the test classpath and run from my IDE, there's no issue. I expect others can just run it similarly? Documentation could spell this out! Why would a test call this? To assert that the stats are "good"?

I think it should support deletes and should not use IW then I ok with it

Sure thing – now made possible with LUCENE-8330.  I'll work on this.

[Legacy Jira: David Smiley (@dsmiley) on May 29 2018]