Closed kevinlove closed 7 years ago
This error just popped up and is affecting a number of different queries in iDigBio (including sorting on some values). Unfortunately, fixing it requires both purchasing new memory for the servers and doing some system level trickery to allow elasticsearch to make use of this memory effectively. Both of these will take some time, so this error may persist for a month or more.
Q - What changed?
A - We have grown (number of records, etc.)
@kevinlove Please try your queries again, we believe we have dropped enough "stuff" from the indexes to restore functionality pending hardware upgrades.
There's a note on https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker talking about fielddata limit (unbounded by default) needs to be less than the circuit breaker limit (60% by default); if it isn't not data will never be evicted.
I think the options are:
uuid
is current the field that has the most fielddata associated with it; if this is redundant to _id
can we remove this field and do some rewriting in the search api to fake it better?See current fielddata usage: http://c18node12:9200/_nodes/stats/indices/fielddata?fields=*
summed across nodes and ordered descending:
[(u'uuid', 9884795056),
(u'flags', 4318329904),
(u'catalognumber', 3658211184),
(u'locality', 3640173320),
(u'scientificname', 3135377328),
(u'_parent', 1172680792),
(u'geopoint.lon', 978252900),
(u'geopoint.lat', 972890532),
(u'specificepithet', 931970312),
(u'genus', 806178180),
(u'taxonid', 724221196),
(u'stateprovince', 608553956),
(u'recordset', 600256260),
(u'country', 559978628),
(u'institutioncode', 489520148),
(u'collectioncode', 464279156),
(u'collector', 459670272),
(u'order', 449519812),
(u'family', 406682004),
(u'kingdom', 314230044),
(u'typestatus', 281170772),
(u'latestepochorhighestseries', 256534632),
(u'earliestperiodorlowestsystem', 240419668),
(u'indexData.dwc:countryCode', 209014320),
(u'basisofrecord', 182129484),
(u'data.dwc:kingdom', 175494924),
(u'commonname', 172482040),
(u'class', 160399924),
(u'formation', 124937484),
(u'occurrenceid', 55524744),
(u'recordset_id', 50245604),
(u'publisher', 7224448),
(u'data.contacts.email', 217384),
(u'_type', 107264)]
The current plan is MOAR RAM, plus bits of the others probably. We'll be doubling the ram on the nodes, and doubling the number of ES instances per node, which should leave enough for overhead for disk cache. We need to do hardware work on the nodes anyways to give us more SSD overhead for building multiple indexes.
This should be fixed for now.
Getting an error I haven't seen before. Doesn't seem related to the amount of records expected back. Query and expected counts below:
http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%2211d3ad3b-38de-4709-8544-ec3c26d96607%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22ansp%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 146747
http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%2255d60f69-eee9-4386-952a-805dfb71830a%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22uafmc%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 5850
http://search.idigbio.org/v2/summary/top/records/?rq={%22recordset%22:%228660ce9a-31c9-48ee-b5bc-9e6ba248ec0f%22,%22collectioncode%22:%22fish%22,%22institutioncode%22:%22csuc%22}&top_fields=%22indexData.dwc:countryCode%22 Expected: 408