pinterest / querybook

Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
https://www.querybook.org
Apache License 2.0
1.88k stars 228 forks source link

Text fields are not optimised for operations that require per-document field data like aggregations and sorting #1349

Open gzhukov opened 11 months ago

gzhukov commented 11 months ago

Hi, after upgrade querybook from 2.4.0 to 3.28.0 we faced with missing table list from our metastore. All databases were empty in UI, but I see that update_metastore job works fine in worker's logs. I checked webserver logs and found such warning: [2023-10-13 Fri 19:30:02] - /opt/querybook/querybook/server/lib/elasticsearch/search_utils.py - WARNING "Got ElasticSearch exception: RequestError(400, 'search_phase_execution_exception', 'Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead. Alternatively, set fielddata=true on [name] in order to load field data by uninverting the inverted index. Note that this can use significant memory.')

After some investigation I added query to log https://github.com/pinterest/querybook/blob/master/querybook/server/lib/elasticsearch/search_utils.py#L64 and found that querybook use sorting function for 'name' field in search_tables_v1 index {'query': {'bool': {'must': [{'function_score': {'query': {'match_all': {}}, 'boost_mode': 'sum', 'script_score': {'script': {'source': "doc['importance_score'].value * 10 + (doc['golden'].value ? 50 : 0)"}}}}], 'filter': {'bool': {'must': [{'match': {'schema': 'abasharin'}}, {'match': {'metastore_id': 3}}]}}}}, 'size': 100, 'from': 0, '_source': ['id', 'schema', 'name'], 'sort': [{'name': {'order': 'asc'}}], 'highlight': {'pre_tags': ['<mark>'], 'post_tags': ['</mark>'], 'type': 'plain', 'fields': {'columns': {'fragment_size': 20, 'number_of_fragments': 5}, 'data_elements': {'fragment_size': 20, 'number_of_fragments': 5}, 'description': {'fragment_size': 60, 'number_of_fragments': 3}}}}"

I downloaded current index mapping: curl https://<es-server>/search_tables_v1/_mapping | jq > update_indeces.json After that I added "fielddata": true, to name field and uploaded new mapping back: curl -XPUT https://<es-server>/search_tables_v1/_mapping -H "Content-Type: application/json" -d @update_indeces.json In helps and solved this issue. update_indeces.json

I tested it with opensearch:2.9.0 and elasticsearch:7.16.2 Also I tried to add fielddata: true to querybook/config/elasticsearch.yaml and recreate indexes but looks like config parser ignores this key.

adamstruck commented 11 months ago

Seems like your indices are out-of-date. The name is set to a keyword here: https://github.com/pinterest/querybook/blob/master/querybook/config/elasticsearch.yaml#L207