mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Deprecate CustomScoreQuery, BoostedQuery and BoostingQuery [LUCENE-8099] #100

Closed mikemccand closed 6 years ago

mikemccand commented 6 years ago

After LUCENE-7998, these three queries can all be replaced by a FunctionScoreQuery. Using lucene-expressions makes them much easier to use as well.


Legacy Jira details

LUCENE-8099 by Alan Woodward (@romseygeek) on Dec 14 2017, resolved Dec 18 2017 Attachments: LUCENE-8099.patch (versions: 2), LUCENE-8099-2.patch Linked issues:

mikemccand commented 6 years ago

Patch deprecating these three queries and adding explanatory javadoc

[Legacy Jira: Alan Woodward (@romseygeek) on Dec 14 2017]

mikemccand commented 6 years ago

+1 to make Lucene easier to use by removing redundant and confusing duplicative Query subclasses pertaining to boosting. Now to your approach...

It should be stated clearly is issues when we make changes to dependencies. Here you've made the queries module now depend on the expressions module. Consequently, all other modules that depend on the queries module (e.g. highlighter) now transitively depend on the expressions module.

Please update this for FunctionScoreQuery: org.apache.lucene.search.uhighlight.MultiTermHighlighting#extractAutomata Boy it would be nice if we had a Query visiting API – LUCENE-3041

Is it heavyweight to suggest using the expressions module just to multiply a score? I don't know. Even if it isn't it seems more complex as a user (to me). Perhaps there could be a convenience method so that there is less we need to remember to multiply the score by some value-source? Or can we simply add some score-multiplying value-source?

Maybe BoostQuery could use a bit of Javadoc to mention to the reader that the queries module contains FunctionScoreQuery for more advanced boosting than constant multiplication.

[Legacy Jira: David Smiley (@dsmiley) on Dec 16 2017]

mikemccand commented 6 years ago

Thanks for reviewing David

Here you've made the queries module now depend on the expressions module

This is a test dependency only, so it won't affect any downstream modules.

org.apache.lucene.search.uhighlight.MultiTermHighlighting

Will update. And yes, it would make life much easier if we could recurse query trees...

Is it heavyweight to suggest using the expressions module just to multiply a score?

DoubleValuesSource had a helper to create sources that used arbitrary functions when I first committed it, but I removed it a bit later because it messed up equality tests (essentially, you can't check Java closures for equality). I think expressions are a good solution? They allow for hashcode/equals tests, they have a very simple syntax, and they're completely general. My worry about adding a special score-multiplying source is that then you start to add other specialised sources, and we end up with the existential horror that is the queries.function.valuesource package. We could maybe make SimpleBindings more user friendly in another issue if you think it's still complex.

Maybe BoostQuery could use a bit of Javadoc

Good idea

[Legacy Jira: Alan Woodward (@romseygeek) on Dec 16 2017]

mikemccand commented 6 years ago

Oh, you're right of course about the dependency; I missed the "test" scope.

RE DoubleValueSource helper: which issue do you refer to? I'd like to learn more. I found LUCENE-7736 though the committed code does show that the lambdas / "ReaderFunction" instances are in fact compared for equality in org.apache.lucene.queries.function.IndexReaderFunctions.IndexReaderDoubleValuesSource

I think score-multiplication is extremely common. I like the utility method you did here for Solr but that's harder to do in Lucene land because it uses various modules (expressions + queries) that don't depend on each other. If we were to reconsider adding more direct functionality for this, perhaps it might add no new public classes – just a static factory method on FunctionScoreQuery? Something to consider.

[Legacy Jira: David Smiley (@dsmiley) on Dec 18 2017]

mikemccand commented 6 years ago

RE DoubleValueSource helper: which issue do you refer to?

LUCENE-7723

Something to consider

Let's wait and see if questions appear on the list after deprecation? Expressions are going to be much more flexible than a static method.

[Legacy Jira: Alan Woodward (@romseygeek) on Dec 18 2017]

mikemccand commented 6 years ago

Patch adding support to MultiTermHighlighting and javadocs to BoostQuery

[Legacy Jira: Alan Woodward (@romseygeek) on Dec 18 2017]

mikemccand commented 6 years ago

+1 LGTM, thanks Alan.

[Legacy Jira: David Smiley (@dsmiley) on Dec 18 2017]

mikemccand commented 6 years ago

Commit a41643285e8ea453da2b0b7e7ebbf9070fe48db0 in lucene-solr's branch refs/heads/branch_7x from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a416432

LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, BoostingQuery

[Legacy Jira: ASF subversion and git services on Dec 18 2017]

mikemccand commented 6 years ago

Commit b01e6023e1cd3c62260b38c05c8d145ba143a2ac in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b01e602

LUCENE-8099: Deprecate CustomScoreQuery, BoostedQuery, BoostingQuery

[Legacy Jira: ASF subversion and git services on Dec 18 2017]

mikemccand commented 6 years ago

Commit c27099b4d1578dd37c284a4c23f3f812d98fe939 in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c27099b

LUCENE-8099: Remove deprecated CustomScoreQuery, BoostedQuery, BoostingQuery

[Legacy Jira: ASF subversion and git services on Dec 18 2017]

mikemccand commented 6 years ago

Thanks David!

[Legacy Jira: Alan Woodward (@romseygeek) on Dec 18 2017]

mikemccand commented 6 years ago

Commit dfaf023d4a97f4356fc256a94951443b1893876f in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dfaf023

LUCENE-8099: Fix xmlqueryparser tests

[Legacy Jira: ASF subversion and git services on Dec 18 2017]

mikemccand commented 6 years ago

Commit 493220ac3be6b42dea8b5a75d1ecf2063f6da92a in lucene-solr's branch refs/heads/branch_7x from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=493220a

LUCENE-8099: Fix IntelliJ dependencies

[Legacy Jira: ASF subversion and git services on Dec 19 2017]

mikemccand commented 6 years ago

Commit 79073fafd34341afb575b4a045f2511d35d30d90 in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=79073fa

LUCENE-8099: Fix IntelliJ dependencies

[Legacy Jira: ASF subversion and git services on Dec 19 2017]

mikemccand commented 6 years ago

I'm coming at this kind of late – but I have to say i'm -1 to this change.

What exactly is the motivation here?

If the goal is to reduce "similar" code in Lucene, then that's fine – we can/should certainly refactor these classes to consolidate their implementation – but why deprecate/remove them completely?

Look at this example of a suggested replcament in the javadocs...

 *     Query balancedQuery = new BoostingQuery(positiveQuery, negativeQuery, 0.01f);
...
 * Clients should instead use FunctionScoreQuery and the lucene-expressions library:
 * <pre>
 *   SimpleBindings bindings = new SimpleBindings();
 *   bindings.add("score", DoubleValuesSource.SCORES);
 *   bindings.add("context", DoubleValuesSource.fromQuery(new ConstantScoreQuery(myContextQuery, boost)));
 *   Expression expr = JavascriptCompiler.compile("score * context");
 *   FunctionScoreQuery q = new FunctionScoreQuery(inputQuery, expr.getDoubleValuesSource(bindings));
 * </pre>

...even if that new code is just as efficient as the old code (Is it?) why make our users replace 1 line with 7? Why should a "novice" user who wants to use a simple concept like a "boosted query" need to dig into understanding "Expression Bindings" and ValueSources

Why aren't we just refactoring the internals of classes like BoostingQuery so that they subclass FunctionScoreQuery and provide (simple) backcompat constructors?


Also: Why does this change flat out remove all the existing test code of these deprecated/removed classes?

Even if the classes are being removed, shouldn't the eisting tests be refactored to assert that the "new" way of doing things (via FunctionScoreQuery) produces the same "expected" results?

[Legacy Jira: Chris M. Hostetter on Jan 02 2018]

mikemccand commented 6 years ago

Hi @hossman

I had a few reasons for wanting to do this:

Re testing, replacement tests had already been committed as part of another issue, in TestFunctionScoreQuery. BoostingQuery didn't have any functionality tests at all, just an equality check and a rewrite check, so the test coverage is actually more extensive now.

why make our users replace 1 line with 7?

I agree this isn't great. And in fact, looking again at the example javadoc, it doesn't correctly replace what BoostingQuery does. Maybe we should add some static methods to FunctionScoreQuery to allow for simple boosting? Something like:

Query boostedbyQuery = FunctionScoreQuery.boostByQuery(input, boostQuery, boostValue);
Query boostedByValue = FunctionScoreQuery.boostByValue(input, boostValuesSource);

Would that meet your concerns?

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 03 2018]

mikemccand commented 6 years ago

Here's a patch demonstrating the static methods idea (against master, backport would include a change to the BoostingQuery/BoostedQuery/CustomScoreQuery javadocs)

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 03 2018]

mikemccand commented 6 years ago

Re testing, replacement tests had already been committed as part of another issue

Cool – thank you for clarifying, that wasn't obvious to me when skimming these particular commits.

Maybe we should add some static methods to FunctionScoreQuery to allow for simple boosting?

+1

As i mentioned, I think it would be nice for backcompat if we could keep the old new FOO(xxx) constructors working as trivial subclasses of FunctionScoreQuery – but I get your point about wanting to reduce confusion/ambiguity in the names. A one line drop in replacement for each of the 3 previous constructors that's easy for people to batch replace on upgrade should be adequate.

As i alluded to in my earlier comment, the one other concern I have is about the relative performance of the old classes vs using the FunctionScoreQuery. I haven't wrapped my head around the old code vs new code enough to have any concrete concerns/objections – I'm just looking for some explicit "vote of confidence" from folks like you who have looked at it in depth that you've thought about it and don't see any reason why the new cosolidated impl would be slower then the old impls.

(The one thing that jumped out at me the other day was the use of a compiled expressions like "score * context" in each query instance in the suggested replacement code for some queries – but it's not clear to me if that would still be involved based on your latest patch)

[Legacy Jira: Chris M. Hostetter on Jan 04 2018]

mikemccand commented 6 years ago

Commit 42154387d4f2a6060da09c4236e2a8dbb575c59e in lucene-solr's branch refs/heads/branch_7x from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4215438

LUCENE-8099: Add some static methods to FunctionScoreQuery to replace Boosting/BoostedQuery

[Legacy Jira: ASF subversion and git services on Jan 05 2018]

mikemccand commented 6 years ago

Commit 79bd05da4dcbb584fffaf62bea683c00d3ac432c in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=79bd05d

LUCENE-8099: Add some static methods to FunctionScoreQuery to replace Boosting/BoostedQuery

[Legacy Jira: ASF subversion and git services on Jan 05 2018]

mikemccand commented 6 years ago

Commit 0c18acb00f3083553e782c916638dfd6bb15cfe9 in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0c18acb

LUCENE-8099: Update MIGRATE.txt

[Legacy Jira: ASF subversion and git services on Jan 05 2018]

mikemccand commented 6 years ago

I committed those extra static methods, and improved the javadocs and migration messages.

Re performance: there shouldn't be any reason for things to be slower, and FunctionScoreQuery is fully integrated with things like two-phase iteration, so should if anything be faster than the queries it's replacing. DoubleValuesSource is iterator based and handles things like existence checks with fewer branches than ValueSource; same with Expressions, which are compiled to java bytecode. It might be useful to add some examples of these queries to the benchmark tests though.

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 05 2018]

mikemccand commented 6 years ago

The two additional convenience methods with backing value sources are fantastic (thank you) – it addresses my earlier concerns. it seems lightweight and needn't involve another module (expressions). At the time I said this, you responded to say this:

DoubleValuesSource had a helper to create sources that used arbitrary functions when I first committed it, but I removed it a bit later because it messed up equality tests (essentially, you can't check Java closures for equality).

Your latest commit doesn't show any changes to equality tests... are we talking about different things? Was Hoss's "-1" what pushed you over to make these changes? I'm wondering if I should give -1's more... I think I felt about as strongly as Hoss but didn't express it.

[Legacy Jira: David Smiley (@dsmiley) on Jan 05 2018]

mikemccand commented 6 years ago

What pushed me over the line on the static methods was the realisation that my suggested replacement for BoostingQuery didn't do what I thought it was doing, although Hoss's -1 helped :) . And I think I also misread your earlier comment about static methods, thinking that you meant them for creating value sources, rather than as query replacements.

There isn't an issue with equality comparisons here, because the static methods use concrete classes with .equals() and .hashCode() implemented on them for their sources. They should have tests for this though, will add them in.

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 05 2018]

mikemccand commented 6 years ago

Commit e3c1947506b65bec93834e0607dc05a86537685e in lucene-solr's branch refs/heads/branch_7x from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e3c1947

LUCENE-8099: Add equality tests for FunctionScoreQuery

[Legacy Jira: ASF subversion and git services on Jan 05 2018]

mikemccand commented 6 years ago

Commit 3980aea18ded0a526900dd649fd5926ae65c3897 in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3980aea

LUCENE-8099: Add equality tests for FunctionScoreQuery

[Legacy Jira: ASF subversion and git services on Jan 05 2018]

mikemccand commented 6 years ago

With the addition of the nice new static utility methods on FunctionScoreQuery, I think you can replace the change you did to Solr's BoostQParserPlugin to simply be: return FunctionScoreQuery.boostByValue(input, vs.asDoubleValuesSource()); Maybe then inline it; I would but it's up to you. Also the MIGRATE.txt notes should mention that these are replacements to some of the classes deleted here.

[Legacy Jira: David Smiley (@dsmiley) on Jan 05 2018]

mikemccand commented 6 years ago

Commit 95e98e1b41c70dff4d3a197b4d510da83dd7857a in lucene-solr's branch refs/heads/branch_7x from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=95e98e1

LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with FunctionScoreQuery.boostByValue()

[Legacy Jira: ASF subversion and git services on Jan 08 2018]

mikemccand commented 6 years ago

Commit 0744fea821366a853b8e239e766b9786ef96cb27 in lucene-solr's branch refs/heads/master from @romseygeek https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0744fea

LUCENE-8099: Replace BoostQParserPlugin.boostQuery() with FunctionScoreQuery.boostByValue()

[Legacy Jira: ASF subversion and git services on Jan 08 2018]

mikemccand commented 6 years ago

I removed BoostQParserPlugin.boostQuery(). MIGRATE.txt already has a bit about the new static methods.

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 08 2018]

mikemccand commented 5 years ago

@romseygeek,

 

In the context of migrating some Solr plugins (specific parser) to Solr v7.6, I noticed that the "BoostedQuery" is deprecated by the use of the "FunctionScoreQuery". With this evolution, Iit is not possible to access  to the underlying "valueSource" anymore. There is only a getter for the wrapped query, but not for the "valueSource". Is it possible to add a getter for the "valueSource" in this class?   Should I create a ticket for that? Thank you.

[Legacy Jira: Gérald Quaire on Jan 23 2019]

mikemccand commented 5 years ago

Should I create a ticket for that?

Please do!

[Legacy Jira: Alan Woodward (@romseygeek) on Jan 23 2019]

mikemccand commented 5 years ago

Thanks @romseygeek.

I have created the ticket: LUCENE-8655

[Legacy Jira: Gérald Quaire on Jan 23 2019]

mikemccand commented 2 years ago

Commit 1ee11a8497db571b83f8a8d68f38e7a9a738b745 in lucene's branch refs/heads/main from David Smiley https://gitbox.apache.org/repos/asf?p=lucene.git;h=1ee11a8

LUCENE-10252: ValueSource.asDoubleValues should not compute the score (#519)

ValueSource.asDoubleValues and asLongValues should not compute the score unless asked to – typically never. This fixes a performance regression since 7.3 LUCENE-8099 when some older boosting queries were replaced with this.

[Legacy Jira: ASF subversion and git services on Jan 03 2022]

mikemccand commented 2 years ago

Commit 72163649b16a909ba21baca4cb6d3840f60855e8 in lucene's branch refs/heads/branch_9x from David Smiley https://gitbox.apache.org/repos/asf?p=lucene.git;h=7216364

LUCENE-10252: ValueSource.asDoubleValues should not compute the score (#519)

ValueSource.asDoubleValues and asLongValues should not compute the score unless asked to – typically never. This fixes a performance regression since 7.3 LUCENE-8099 when some older boosting queries were replaced with this.

[Legacy Jira: ASF subversion and git services on Jan 03 2022]