mikemccand / stargazers-migration-test

Testing Lucene's Jira -> GitHub issues migration
0 stars 0 forks source link

Explore Relevance Based Performance Benchmarks [LUCENE-8841] #839

Open mikemccand opened 5 years ago

mikemccand commented 5 years ago

While discussing improvements in relevance of fuzzy queries with @jimczi, the topic of how to measure impact of changes to relevance of common queries came up. While a non trivial effort, having such a benchmark will allow us to measure the impact of potential changes and also catch regressions well in time.

 

This Jira tracks ideas and efforts in that direction


Legacy Jira details

LUCENE-8841 by Atri Sharma (@atris) on Jun 07 2019, updated Jun 08 2019

mikemccand commented 5 years ago

Big +1, though I suspect it would be very hard! This could be an Apache project in and of itself...

One challenge is that the number of use cases Lucene is used is tremendously diverse, from job search, to e-commerce, to legal search, to enterprise search, to news search, to Web search, to everything in between and outside the box. You wouldn't want a situation, for example, where you only have an e-commerce test set, so you end up creating a situation where Enterprise search users are harmed because of decisions made optimizing an e-commerce set. 

Another challenge is getting reliable relevance judgments. Teams go deep into developing their methodology for creating a golden set of judgments. This of course can be very domain specific and challenging problem. There's not a one-size-fits-all obvious approach. Some teams use human judges, others crowd source, others very analytics based. Some have access to conversion data, others don't. You have all sorts of biases to contend with in every situation. And the judgments evolve over time. (today's most relevant iPhone isn't the same as 2 years ago). So getting it right takes a lot of energy and time from mature search orgs. So what judgments/data you choose isn't clear if you want to cover a broad range of use cases.

I think the best case is to partner with some organizations that are willing to open up this data alongside their corpus. Where we could validate and feel good about the methodology they use in generating judgments. You'd need to update the relevance judgments and corpus over time. There's of course TREC and other academic datasets, that's one data point. Some folks I know at Wikipedia have talked about this. But you'd want some more commercial datasets (corpus + judgments).

But partnering with orgs would also have limits, as this stuff has very high-value to companies... But perhaps they'd be incentivized to open up their data if Lucene was going to make decisions with it that helped them?!?

 

[Legacy Jira: Doug Turnbull on Jun 08 2019]

mikemccand commented 5 years ago

It used to be an Apache project. :) (Lucene subproject actually) https://lucene.apache.org/openrelevance/

[Legacy Jira: Adrien Grand (@jpountz) on Jun 08 2019]

mikemccand commented 5 years ago

Could we consider resurrecting it then? – Regards,

Atri l'apprenant

[Legacy Jira: Atri Sharma (@atris) on Jun 08 2019]

mikemccand commented 5 years ago

Yes that would be possible.

[Legacy Jira: Adrien Grand (@jpountz) on Jun 08 2019]