Closed abhimech001 closed 2 years ago
Thank you for the detailed question and for opening the issue so we can address your concern.
case 1
field table type string {
indexing: summary | attribute | index
indexing-rewrite: none
}
In this case you have both attribute and index so searches will be using the index part and you get posting lists with match mode text. The field is put also into memory as you have also included attribute. I'm not sure if you have done this on purpose. The only reason for that would be if you want to use the field for grouping or sorting. With match mode text a query like 'testing.vespa.search' would be evaluated as 'testing AND vespa AND search". If you want literal matching including punctation, consider using match:word for exact matching.
case 2
field map_field_s type map<string, string> {
indexing: summary
struct-field key {indexing: attribute}
struct-field value {indexing: attribute}
}
This field will not have posting lists as attribute is not defined with fast-search so when searching this field you get a linear scan which is slow. Changing to
field map_field_s type map<string, string> {
indexing: summary
struct-field key {
indexing: attribute
attribute:fast-search
}
struct-field value {
indexing: attribute
attribute:fast-search
}
}
Will build posting lists for fast search which should be on-pair with the performance you get from the table field.
Documentation:
Depending on your use case you can also tune latency by using more threads per search, see https://docs.vespa.ai/en/performance/sizing-search.html#:~:text=The%20threads%20per,query%20volume%20applications.
Also fields which you don't use in any ranking expression could be optimized by using rank:filter
. See mentioned document https://docs.vespa.ai/en/performance/feature-tuning.html.
Thank you for the detailed question and for opening the issue so we can address your concern.
case 1
field table type string { indexing: summary | attribute | index indexing-rewrite: none }
In this case you have both attribute and index so searches will be using the index part and you get posting lists with match mode text. The field is put also into memory as you have also included attribute. I'm not sure if you have done this on purpose. The only reason for that would be if you want to use the field for grouping or sorting. With match mode text a query like 'testing.vespa.search' would be evaluated as 'testing AND vespa AND search". If you want literal matching including punctation, consider using match:word for exact matching.
case 2
field map_field_s type map<string, string> { indexing: summary struct-field key {indexing: attribute} struct-field value {indexing: attribute} }
This field will not have posting lists as attribute is not defined with fast-search so when searching this field you get a linear scan which is slow. Changing to
field map_field_s type map<string, string> { indexing: summary struct-field key { indexing: attribute attribute:fast-search } struct-field value { indexing: attribute attribute:fast-search } }
Will build posting lists for fast search which should be on-pair with the performance you get from the table field.
Documentation:
- attribute versus index and attribute documentation https://docs.vespa.ai/en/attributes.html
- On attribute fast-search https://docs.vespa.ai/en/performance/feature-tuning.html#when-to-use-fast-search
- On match mode https://docs.vespa.ai/en/reference/schema-reference.html#match
Depending on your use case you can also tune latency by using more threads per search, see https://docs.vespa.ai/en/performance/sizing-search.html#:~:text=The%20threads%20per,query%20volume%20applications.
Also fields which you don't use in any ranking expression could be optimized by using
rank:filter
. See mentioned document https://docs.vespa.ai/en/performance/feature-tuning.html.
If changed to fast search, how do we validate b-tree is constructed for existing data ? Is there an api available to add a field dynamically to a schema rather than deploying application again and again What is the impact on searches and ingestion while deploying the new application?
If changed to fast search, how do we validate b-tree is constructed for existing data ?
This change requires restart - see https://docs.vespa.ai/en/reference/schema-reference.html#modifying-schemas These (orchestrated) restarts are scheduled automatically. You can see whether they have happened in the node view in the console.
Is there an api available to add a field dynamically to a schema rather than deploying application again and again
Very wrong question :-) Deploying is a (web service) API. Deploying again and again, many times a day is normal, expected and safe. For production you should have a continuous deployment job which builds and deploys any change that happens to your application package source repo master branch (CD), see https://cloud.vespa.ai/en/automated-deployments
Since any change is made to the application package source, and is rolled out in a controlled fashion, progressing through test stages etc. this is safe and general (applicable to any set of changes).
If people could override this by directly making changes in production, there wouldn't exist any authoritative and representation of what you were actually running (it would instead be whatever is the result of the history of such direct changes), and so there would be no process to change it safely, no audit trail and no review process (which you now get from source control), no safety mechanism so people would randomly break production, and no way to safely make multiple interdependent changes.
What is the impact on searches and ingestion while deploying the new application?
Negligible. No disruption to queries or writes, but some small additional resource usage. Most changes happen live and are fast and lightweight, but some requires restart, or even reindexing, as documented in the link above. That work will happen automatically and in the background, so it will take longer to complete, but will not cause any disruption to service.
Thank you for the detailed question and for opening the issue so we can address your concern.
case 1
field table type string { indexing: summary | attribute | index indexing-rewrite: none }
In this case you have both attribute and index so searches will be using the index part and you get posting lists with match mode text. The field is put also into memory as you have also included attribute. I'm not sure if you have done this on purpose. The only reason for that would be if you want to use the field for grouping or sorting. With match mode text a query like 'testing.vespa.search' would be evaluated as 'testing AND vespa AND search". If you want literal matching including punctation, consider using match:word for exact matching.
case 2
field map_field_s type map<string, string> { indexing: summary struct-field key {indexing: attribute} struct-field value {indexing: attribute} }
This field will not have posting lists as attribute is not defined with fast-search so when searching this field you get a linear scan which is slow. Changing to
field map_field_s type map<string, string> { indexing: summary struct-field key { indexing: attribute attribute:fast-search } struct-field value { indexing: attribute attribute:fast-search } }
Will build posting lists for fast search which should be on-pair with the performance you get from the table field.
Documentation:
- attribute versus index and attribute documentation https://docs.vespa.ai/en/attributes.html
- On attribute fast-search https://docs.vespa.ai/en/performance/feature-tuning.html#when-to-use-fast-search
- On match mode https://docs.vespa.ai/en/reference/schema-reference.html#match
Depending on your use case you can also tune latency by using more threads per search, see https://docs.vespa.ai/en/performance/sizing-search.html#:~:text=The%20threads%20per,query%20volume%20applications.
Also fields which you don't use in any ranking expression could be optimized by using
rank:filter
. See mentioned document https://docs.vespa.ai/en/performance/feature-tuning.html.
for case 2 updated the map fields key and value with recommended fast-search. Improvement of 15s (ie median dropped from 120s to 105s) is observed but still there are failures due to 120s timeout. please suggest if any more tuning can be done to further improve the response time.
I did not notice that you also have a regular expression in the query, what is the behavior without that languages regular expression? Regular expression cannot be indexed so it’s linear with number of documents in the index.
If the field has fast-search, it is linear with number of unique values.
Right, it’s possible to add trace of query execution by adding &tracelevel=6&trace.timestamps. This should give us a good understanding of what is going on.
Any updates on this @abhimech001 ?
Hey Joe,
By regular expression do you mean this part of the query? "AND languages matches "Eng*";","timeout": "120s
@107dipan yes.
ooh.. ok Let me try it out.
Thanks a lot!!!
Vespa supports tracing, for example https://api.cord19.vespa.ai/search/?query=sars+cov+2&&tracelevel=6&trace.timestamps
The trace includes the query blueprint, which is the query execution plan.
Hi Team,
I have created a vespa multinode stack and performed a load test for search query 1) search with single field with 60s timeout - median was consistent around 150 to 200ms endpoint : /search/ http request : POST Body: { "yql":"select from sources where table contains \"testing.vespa.search\" } 2) search with more fields and conditions - median was 100+ secs which is very high and few requests still timed out even if the timeout was increased to 120s.
endpoint : /search/ http request : POST Body: { "yql":"select from sources where table contains \"testing.vespa.search\" AND map_field_s contains sameElement(key contains \"month_s\",value contains \"may\") OR map_field_s contains sameElement(key contains \"user_s\",value contains \"Vespa\") OR map_field_s contains sameElement(key contains \"year_s\",value contains \"2014\") AND languages matches \"Eng*\";","timeout": "120s" }
Can you suggest what could be the reason for such behaviour.
document fields are