o19s / quepid

Improve your Elasticsearch, OpenSearch, Solr, Vectara, Algolia and Custom Search search quality.
http://www.quepid.com
Apache License 2.0
284 stars 101 forks source link

Fix RRE export format #928

Open epugh opened 9 months ago

epugh commented 9 months ago

Describe the bug

We export rre as:

"relevant_documents": {
        "1.0": [
          "l_1559"
        ]
      }

But in talking to @jillesvangurp figured out that it should be...

 "relevant_documents_fixed": {
        "l_1559": {
          "gain" : 1.0
        }
      }

To Reproduce Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

jillesvangurp commented 9 months ago

Just for reference, I had a go at implementing an rre import for rankquest studio. The implementation lives here in rankquest-core:

https://github.com/jillesvangurp/rankquest-core/blob/main/src/commonMain/kotlin/com/jilesvangurp/rankquest/core/rre-support.kt

It's pretty easy to add more features there or alternative formats.

I based my implementation on the one example in the RRE repo that I was able to find:

https://github.com/SeaseLtd/rated-ranking-evaluator/blob/master/rre-core/src/test/resources/engine_evaluation_tests/ratings/ratings_example.json

I'm happy to do some more work on this but it would help to get some better examples to work with.

jillesvangurp commented 9 months ago

Other open questions is whether the rating should be an Int or a Double. I'm treating it as an Int so far.

epugh commented 9 months ago

Here is the new format for RRE:

{
    "id_field": "id",
    "index": "tmdb",
    "template": "template.json",
    "queries": [
        {
            "placeholders": {
                "$query": "=cmd"
            }
        },
        {
            "placeholders": {
                "$query": "First Query"
            }
        },
        {
            "placeholders": {
                "$query": "Second Query"
            },
            "relevant_documents": {
                "docb": {
                    "gain": 1
                },
                "doca": {
                    "gain": 3
                }
            }
        },
        {
            "placeholders": {
                "$query": "Third Query"
            }
        },
        {
            "placeholders": {
                "$query": "Fourth Query"
            }
        }
    ]
}
jillesvangurp commented 9 months ago

It's still a bit different from the format I linked above which has topics, query_groups, and then queries. Did you test the format with RRE?

epugh commented 9 months ago

so i don't have topics, and i thought some of that was optional.. honestly, i haven't tested it casue it's been a while that i've used RRE....

epugh commented 9 months ago

so, i wonder if i should just support the direct rankquest format instead?

jillesvangurp commented 9 months ago

depends, who else is using the export currently?

jillesvangurp commented 9 months ago

I modeled my importer after the one sample I found in their repo. But of course it would be nice if that lines up with what people actually are using and expecting currently.

Otherwise, I'm open to suggestions and in no way tied to the RRE format.

epugh commented 9 months ago

Okay, I think the better route to go is to introduce a RankQuest format.. that way each can evolve as market demand drives it.. Do you have an example of a export file I can use?

jillesvangurp commented 9 months ago

Here's an example:

movie-quotes-rated-searches-2024-01-22T16 56 59.954Z.json

You need a matching search plugin configuration that can handle the parameters. The parameter map (search context) is all strings. Comment and tag fields are optional.

jillesvangurp commented 9 months ago

the label is optional too but it's nice to have some hint what the document is about. Usually the document title would be appropriate.

jillesvangurp commented 9 months ago

size in the search context refers to how many results to fix, the rest is similar to the parameters in rre.