openfoodfacts / openfoodfacts-monitoring

1 stars 5 forks source link

Kibana is spamming us because ES as a circuit_breaking_exception #55

Closed alexgarel closed 2 years ago

alexgarel commented 2 years ago

Kibana is down and it's spamming infrastructure-alerts (and also we maybe loosing logs…)

$ curl -XGET http://elasticsearch:9200/license?pretty=true
{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[parent] Data too large, data for [<http_request>] would be [1022127656/974.7mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1022127656/974.7mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, model_inference=0/0b, eql_sequence=0/0b, accounting=76897616/73.3mb]",
        "bytes_wanted" : 1022127656,
        "bytes_limit" : 1020054732,
        "durability" : "PERMANENT"
      }
    ],
    "type" : "circuit_breaking_exception",
    "reason" : "[parent] Data too large, data for [<http_request>] would be [1022127656/974.7mb], which is larger than the limit of [1020054732/972.7mb], real usage: [1022127656/974.7mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=0/0b, in_flight_requests=0/0b, model_inference=0/0b, eql_sequence=0/0b, accounting=76897616/73.3mb]",
    "bytes_wanted" : 1022127656,
    "bytes_limit" : 1020054732,
    "durability" : "PERMANENT"
  },
  "status" : 429
}

elastic search

alexgarel commented 2 years ago

I gave more memory to ES yesterday to see if it was the problem, but it's not. (see #54)

It's a problem I already encountered in the past, but I don't remember well the solution… I'll have to dig !

alexgarel commented 2 years ago

I tried to edit docker-compose.yml in prod to add new settings (reports to come):

```yaml
  elasticsearch:
  ...
    environment:
    ...
      - "indices.breaker.total.use_real_memory=true"
      - "indices.breaker.request.limit=95%"

Let see if it helps !

alexgarel commented 2 years ago

report (wip) here: https://github.com/openfoodfacts/openfoodfacts-infrastructure/pull/131

teolemon commented 2 years ago

elasticsearch

alexgarel commented 2 years ago

From where I am stopped right not, it seems a problem of ILM (Index Lifecycle Management).

Either the policy is to lax, or it is not applied… but we have far too much indexes !

trnl commented 2 years ago

@alexgarel, is it still the problem?

alexgarel commented 2 years ago

It's fixed yes !