iDigBio / idigbio-search-api

Server-side code driving iDigBio's search functionality.
GNU General Public License v3.0
24 stars 5 forks source link

[circuit_breaking_exception] [fielddata] Data too large, data for [indexData.dwc:countryCode] #25

Closed kevinlove closed 7 years ago

kevinlove commented 7 years ago

Getting an error I haven't seen before. Doesn't seem related to the amount of records expected back. Query and expected counts below:

http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%2211d3ad3b-38de-4709-8544-ec3c26d96607%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22ansp%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 146747

http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%2255d60f69-eee9-4386-952a-805dfb71830a%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22uafmc%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 5850

http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%228660ce9a-31c9-48ee-b5bc-9e6ba248ec0f%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22csuc%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 408

godfoder commented 7 years ago

This error just popped up and is affecting a number of different queries in iDigBio (including sorting on some values). Unfortunately, fixing it requires both purchasing new memory for the servers and doing some system level trickery to allow elasticsearch to make use of this memory effectively. Both of these will take some time, so this error may persist for a month or more.

danstoner commented 7 years ago

Q - What changed?

A - We have grown (number of records, etc.)

danstoner commented 7 years ago

@kevinlove Please try your queries again, we believe we have dropped enough "stuff" from the indexes to restore functionality pending hardware upgrades.

UnwashedMeme commented 7 years ago

There's a note on https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker talking about fielddata limit (unbounded by default) needs to be less than the circuit breaker limit (60% by default); if it isn't not data will never be evicted.

I think the options are:

See current fielddata usage: http://c18node12:9200/_nodes/stats/indices/fielddata?fields=*

summed across nodes and ordered descending:

[(u'uuid', 9884795056),
 (u'flags', 4318329904),
 (u'catalognumber', 3658211184),
 (u'locality', 3640173320),
 (u'scientificname', 3135377328),
 (u'_parent', 1172680792),
 (u'geopoint.lon', 978252900),
 (u'geopoint.lat', 972890532),
 (u'specificepithet', 931970312),
 (u'genus', 806178180),
 (u'taxonid', 724221196),
 (u'stateprovince', 608553956),
 (u'recordset', 600256260),
 (u'country', 559978628),
 (u'institutioncode', 489520148),
 (u'collectioncode', 464279156),
 (u'collector', 459670272),
 (u'order', 449519812),
 (u'family', 406682004),
 (u'kingdom', 314230044),
 (u'typestatus', 281170772),
 (u'latestepochorhighestseries', 256534632),
 (u'earliestperiodorlowestsystem', 240419668),
 (u'indexData.dwc:countryCode', 209014320),
 (u'basisofrecord', 182129484),
 (u'data.dwc:kingdom', 175494924),
 (u'commonname', 172482040),
 (u'class', 160399924),
 (u'formation', 124937484),
 (u'occurrenceid', 55524744),
 (u'recordset_id', 50245604),
 (u'publisher', 7224448),
 (u'data.contacts.email', 217384),
 (u'_type', 107264)]
godfoder commented 7 years ago

The current plan is MOAR RAM, plus bits of the others probably. We'll be doubling the ram on the nodes, and doubling the number of ES instances per node, which should leave enough for overhead for disk cache. We need to do hardware work on the nodes anyways to give us more SSD overhead for building multiple indexes.

godfoder commented 7 years ago

This should be fixed for now.