vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.65k stars 590 forks source link

[bug] Same YQL triggers error when a second embedding component is added to services.xml #28160

Closed eostis closed 1 year ago

eostis commented 1 year ago

1. The schema:

schema Vespa1 {
  document Vespa1 {

   field wpsolr_title type string {
      indexing: summary | index
      summary: dynamic
    }

    field wpsolr_content type string {
      indexing: summary | index
      summary: dynamic
    }

  }

  field wpsolr_field_vector_text type tensor<float>(x[384]) {
    indexing {
      "passage: " . input wpsolr_title . " " . input wpsolr_content | embed wpsolr_multilingual_e5_small_onnx | attribute
    }
  }

  rank-profile wpsolr_rank_vector_text {
    num-threads-per-search: 1
    match-features: distance(label, wpsolr_search_operator_nearest_neighbor_text)
    inputs {
      query(q_384) tensor<float>(x[384])
      query(q_768) tensor<float>(x[768])
      query(q_1024) tensor<float>(x[1024])
    }

    first-phase {
      expression: closeness(label, wpsolr_search_operator_nearest_neighbor_text)
    }
  }
}

2. The successful query with one embedding component:

{"offset":0,"hits":20,"input.query(q_384)":"embed(wpsolr_multilingual_e5_small_onnx, query: parot)","ranking":"wpsolr_rank_vector_text","yql":"select wpsolr_id, wpsolr_PID, wpsolr_type, wpsolr_meta_type_s, wpsolr_title, wpsolr_numcomments, wpsolr_comments, wpsolr_displaydate, wpsolr_displaymodified, wpsolr_author, wpsolr_snippet_s, wpsolr_content from Vespa1 where ((wpsolr_type contains ({stem: false}\"post\")) and !(((wpsolr_post_status_s contains ({stem: false}\"draft\")) or (wpsolr_post_status_s contains ({stem: false}\"pending\")) or (wpsolr_post_status_s contains ({stem: false}\"trash\")) or (wpsolr_post_status_s contains ({stem: false}\"future\")) or (wpsolr_post_status_s contains ({stem: false}\"private\")) or (wpsolr_post_status_s contains ({stem: false}\"auto-draft\")))) and (!(!((wpsolr_is_excluded_s contains ({stem: false}\"_wpsolr_undefined\")))) or (wpsolr_is_excluded_s contains ({stem: false}\"n\"))) and ({label:'wpsolr_search_operator_nearest_neighbor_text',approximate:true,targetHits: 100}nearestNeighbor(wpsolr_field_vector_text, q_384))) | all(all(group(wpsolr_type) order(-count()) max(20) each(output(count()))))"}

image

3. Now, adding and deploying a second embedding component:

image

4. The same query, unsuccessful this time:

image
jobergum commented 1 year ago

One must provide the embedder id when there is more than one embedder.

embed(delete_me_please, "the text to embed")

I'll try to clarify the documentation.

eostis commented 1 year ago

I did it, didn't I?

"input.query(q_384)":"embed(wpsolr_multilingual_e5_small_onnx, query: parot)

eostis commented 1 year ago

Or is it related to #28159?

jobergum commented 1 year ago

I did it, didn't I?

No, because that would be

"input.query(q_384)":"embed(wpsolr_multilingual_e5_small_onnx, \"query: parot\")"
eostis commented 1 year ago

It worked. 🙏

image
jobergum commented 1 year ago

I'm keeping this open until I have managed to fix the documentation.

jobergum commented 1 year ago

Try to clarify and recommend always specifying embedder id https://github.com/vespa-engine/documentation/pull/2843