vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.85k stars 606 forks source link

elasticsearch query to vespa query translator #16872

Open kkraune opened 3 years ago

kkraune commented 3 years ago

some kind of helper tool that translates from an Elasticsearch / Solr / Lucene query to YQL / query API, possibly combined with supporting documentation in https://vespa.ai/migrating-from-elastic-search-to-vespa - this will make it easier to do apples-to-apples comparisons for users doing evaluations

107dipan commented 3 years ago

Is there any way we can perform a yql search query by boosting a term like in lucene. https://lucene.apache.org/core/2_9_4/queryparsersyntax.html

jobergum commented 3 years ago

@107dipan yes, multiple ways to accomplish this

{
  "query": "jakarta!400 apache",
   "type": "any",
    "yql": "select * from sources * where userQuery()",
    "ranking.profile": "someprofile-with-nativerank"
}

The above will weight the term jakarta 4x more than the term apache (default term weight is 100) but you need a ranking profile which uses nativeRank https://docs.vespa.ai/documentation/nativerank.html.

You can also adjust weights via YQL annotations and use YQL only:

{
"yql": "select * from sources * where (default contains "apache" OR default contains ([{"weight": 400}]"jakarta"));",
"ranking.profile": "someprofile-with-nativerank"
}

See https://docs.vespa.ai/en/reference/query-language-reference.html

Also there is the rank query operator which allows you to retrieve using one set of terms and produce ranking features by others which does not impact recall but only precision. In the below example we retrieve documents with apache and produce matching ranking features for the term jakarta.

{
"yql": "select * from sources * where rank(default contains "apache", default contains ([{"weight": 400}]"jakarta"));",
"ranking.profile": "someprofile-with-nativerank"
}
107dipan commented 3 years ago

Great!! What about Fuzzy search and proximity search for string fields?

jobergum commented 3 years ago

See https://github.com/vespa-engine/vespa/issues/9371 on Fuzzy. Proximity see onear/near in the YQL doc @107dipan

107dipan commented 3 years ago

Thanks a lot!

jobergum commented 3 years ago

@107dipan Did you want to give it a shot to write a query translator or just general interest?

107dipan commented 3 years ago

I am planning on writing the query translator. Wanted to map different lucene functionalities with equivalent vespa yql before starting.

jobergum commented 3 years ago

Great! Keep us posted @107dipan if you have questions.

xansrnitu commented 10 months ago

Hi @jobergum , Does "query": "jakarta!400 apache", still work if I am using a hybrid rank profile, say, as defined below-

with app.syncio(connections=1) as session:
  query = "How Fruits and Vegetables Can Treat Asthma?"
  response:VespaQueryResponse = session.query(
    yql="select * from sources * where title contains \"vegetable\" and rank({targetHits:1000}nearestNeighbor(embedding,q), userQuery()) limit 5",
    query=query,
    ranking="fusion",
    body = {
      "input.query(q)": f"embed({query})"
    }
  )

Source - https://pyvespa.readthedocs.io/en/latest/getting-started-pyvespa-cloud.html#Hybrid-search-with-filters

jobergum commented 10 months ago

The query parameter accepts the simple query language and jakarta!400 lucene will add 400 weight to the term jakarta (default is 100) - but, the term weight is not used in all text ranking features. It is used in nativeRank, nativeFieldMatch, but not in bm25.

So the answer is, yes, you can pass weights using the simple query language, but how it impacts ranking, depends on which text ranking features you are using.

nakulpathak3 commented 6 months ago

Hi, just checking if a query translator was added. We'd like to move from Elasticsearch to Vespa and any tools to do this would be helpful.

bratseth commented 5 months ago

Sorry, no translator in public domain yet. We know of several teams have created this and are trying to get them to open source ...