vespa-engine / vespa

AI + Data, online. https://vespa.ai
https://vespa.ai
Apache License 2.0
5.58k stars 586 forks source link

Unable to deploy schema with multiple documentSummaries #23099

Closed 107dipan closed 2 years ago

107dipan commented 2 years ago

I am getting exception while deploying vespa with one schema containing 2 document summaries. I am currently using the vespa version 7.559.12

Error Message - Request failed. HTTP status code: 400 Invalid application package: Error loading default.default: Conflicting summary transforms. summary field 'title' is already defined as summary field 'title'. A field with the same name can not have different transforms in different summary classes command terminated with exit code 1

Schema Definition -

schema testVespaDoc {

document testVespaDoc {
    field title type array<string> {
        indexing: summary | attribute | index

    }
 }

document-summary summaryA {

    summary title type array<string> {
        source: title

    }
}

document-summary summaryB {

    summary title type array<string> {
        source: title
        dynamic
    }
}

}

geirst commented 2 years ago

This is a known limitation of document summaries, and the error message says what is not supported: "A field with the same name can not have different transforms in different summary classes". In summaryA the title field is defined to just return the field value, while in summaryB the title field is defined to be a dynamic summary, returning a snippet of the field value. These are different "transforms" of the field value. You need to rename the field in one of the document summaries, e.g. to _titledyn in summaryB.

107dipan commented 2 years ago

I dont think "title_dyn" will work since clients might be expecting different fields. Is there a workaround to request "dynamic schema" based on the request?

geirst commented 2 years ago

If you need to provide an API to your clients where the title field is always named title, independent of which document summary is used in the query request, you can create a Java Searcher that iterates the Hit instances of the Result and creates a title field using the content of _titledyn if summaryB was used. See the following for details on Searcher development: https://docs.vespa.ai/en/searcher-development.html.

bratseth commented 2 years ago

Note though that dynamic fields should be rendered differently.

107dipan commented 2 years ago

Could you tell us how we can render dynamic fields separately?

bratseth commented 2 years ago

Well you can write your own renderer class which does whatever you want: https://docs.vespa.ai/en/result-rendering.html

But what i mean here is just that your clients probably need to treat a dynamic field differently so it may not be a good idea to try to use the same summary field name.

107dipan commented 2 years ago

Got it! Thanks a lot!!

107dipan commented 2 years ago

Hey @geirst, I tried using a different field name but I am still facing some issues. I am not getting the field in the response when I am trying to use the document summary with dynamic summary

Changed the document summary to - document-summary summaryB {

    summary title_teaser type array<string> {
        source: title
        dynamic
    }
}

Document Ingested - { "fields": { "title": ["The content of an indexed string field is language-agnostic. One must therefore apply a symmetric tokenization on the query terms in order to match the content of that field.The query parser subscribes to configuration that tells it what fields are indexed strings, and every query term that targets such a field are run through appropriate tokenization. The language query parameter is what controls the language state of these calls.Because an index may simultaneously contain terms in any number of languages, one can have stemmed variants of one language match the stemmed variants of another. To work around this, store the language of a document in a separate attribute, and apply a filter against that attribute at query-time.If no language parameter is given, the language detector is called to process the query string. The detector is likely to be confused by field names and query syntax, but it is a best-effort approach. This matches the language resolution of the index pipeline.By default, there is no knowledge anywhere that captures what languages are used to generate the content of an index. The language parameter only affects the transformation of query terms that hit tokenized indexes."] } }

Search Query - { "yql": "select from sources where title contains \"tokenized\";", "timeout": "120s", "presentation.summary": "summaryB" }

Query Response - { "root": { "id": "toplevel", "relevance": 1.0, "fields": { "totalCount": 1 }, "coverage": { "coverage": 100, "documents": 1, "full": true, "nodes": 6, "results": 1, "resultsFull": 1 }, "children": [ { "id": "index:ddoc/4/c4ca42386cd52a67065b91c5", "relevance": 0.1620547555884915, "source": "ddoc", "fields": { "sddocname": "testVespaDoc" } } ] } }