grafana / helm-charts

Apache License 2.0
1.65k stars 2.27k forks source link

server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) #610

Open dmitry-mightydevops opened 3 years ago

dmitry-mightydevops commented 3 years ago

Used the loki chart https://github.com/grafana/helm-charts/blob/main/charts/loki/values.yaml

get the following in my promtail pods:

level=warn ts=2021-08-09T01:02:45.604013738Z caller=client.go:344 component=client host=loki.monitoring:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '4963' lines totaling '705285' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2021-08-09T01:02:46.235952629Z caller=client.go:344 component=client host=loki.monitoring:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '4963' lines totaling '705285' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2021-08-09T01:02:47.618849145Z caller=client.go:344 component=client host=loki.monitoring:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '4963' lines totaling '705285' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2021-08-09T01:02:51.138024849Z caller=client.go:344 component=client host=loki.monitoring:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '4963' lines totaling '705285' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2021-08-09T01:02:56.281304488Z caller=client.go:344 component=client host=loki.monitoring:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded (limit: 4194304 bytes/sec) while attempting to ingest '4963' lines totaling '705285' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"

values.promtail.yaml

resources:
  limits:
    cpu: 200m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

config:
  lokiAddress: http://loki.monitoring:3100/loki/api/v1/push

values.loki.yaml

nodeSelector:
  ops: "true"

rbac:
  create: true
  pspEnabled: true

config:
  limits_config:
    enforce_metric_name: false
    reject_old_samples: true
    reject_old_samples_max_age: 168h
    ingestion_rate_mb: 10
    ingestion_burst_size_mb: 20

and so adding the following two lines into limits_config had no effect.

ingestion_rate_mb: 10
ingestion_burst_size_mb: 20

I'm using latest charts

✗  helm search repo loki                            
NAME                        CHART VERSION   APP VERSION DESCRIPTION                                       
grafana/loki                2.6.0           v2.3.0      Loki: like Prometheus, but for logs.              
grafana/loki-canary         0.4.0           2.3.0       Helm chart for Grafana Loki Canary                
grafana/loki-distributed    0.36.0          2.3.0       Helm chart for Grafana Loki in microservices mode 
grafana/loki-stack          2.4.1           v2.1.0      Loki: like Prometheus, but for logs.              
loki/loki                   2.1.1           v2.0.0      DEPRECATED Loki: like Prometheus, but for logs.   
loki/loki-stack             2.1.2           v2.0.0      DEPRECATED Loki: like Prometheus, but for logs.   
loki/fluent-bit             2.0.2           v2.0.0      DEPRECATED Uses fluent-bit Loki go plugin for g...
loki/promtail               2.0.2           v2.0.0      DEPRECATED Responsible for gathering logs and s...
grafana/fluent-bit          2.3.0           v2.1.0      Uses fluent-bit Loki go plugin for gathering lo...
grafana/promtail            3.7.0           2.3.0       Promtail is an agent which ships the contents o...
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm show values grafana/promtail

helm upgrade --install promtail grafana/promtail \
    --create-namespace \
    --namespace monitoring \
    --values cluster/production/charts/loki/values.promtail.yaml 

helm upgrade --install loki grafana/loki \
    --create-namespace \
    --namespace monitoring \
    --values cluster/production/charts/loki/values.loki.yaml 
dmitry-mightydevops commented 3 years ago

I even went inside the node running loki, inside the loki container and this is the config:

PID   USER     TIME  COMMAND
    1 loki      0:03 /usr/bin/loki -config.file=/etc/loki/loki.yaml
   46 loki      0:00 ash
   60 loki      0:00 ps aufx
/ $ cat /etc/loki/loki.yaml 
auth_enabled: false
chunk_store_config:
  max_look_back_period: 0s
compactor:
  shared_store: filesystem
  working_directory: /data/loki/boltdb-shipper-compactor
ingester:
  chunk_block_size: 262144
  chunk_idle_period: 3m
  chunk_retain_period: 1m
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
  max_transfer_retries: 0
limits_config:
  enforce_metric_name: false
  ingestion_burst_size_mb: 20
  ingestion_rate_mb: 10
  reject_old_samples: true
  reject_old_samples_max_age: 168h
schema_config:
  configs:
  - from: "2020-10-24"
    index:
      period: 24h
      prefix: index_
    object_store: filesystem
    schema: v11
    store: boltdb-shipper
server:
  http_listen_port: 3100
storage_config:
  boltdb_shipper:
    active_index_directory: /data/loki/boltdb-shipper-active
    cache_location: /data/loki/boltdb-shipper-cache
    cache_ttl: 24h
    shared_store: filesystem
  filesystem:
    directory: /data/loki/chunks
table_manager:
  retention_deletes_enabled: false
  retention_period: 0s/ $ 

So these values got applied (from the HELM chart), but I still get that original error I reported. I saw this discussion https://community.grafana.com/t/discarding-promtail-log-entries-en-masse/41128 but apparently something else I have missed.

alexandre1984rj commented 2 years ago

@dmitry-mightydevops try the per_stream_rate_limit https://grafana.com/docs/loki/latest/configuration/#limits_config

rmn-lux commented 2 years ago

My experimental config, which seemed to help get rid of the 429 code. Might come in handy.

      retention_period: 72h
      enforce_metric_name: false
      reject_old_samples: true
      reject_old_samples_max_age: 168h
      max_cache_freshness_per_query: 10m
      split_queries_by_interval: 15m
      # for big logs tune
      per_stream_rate_limit: 512M
      per_stream_rate_limit_burst: 1024M
      cardinality_limit: 200000
      ingestion_burst_size_mb: 1000
      ingestion_rate_mb: 10000
      max_entries_limit_per_query: 1000000
      max_label_value_length: 20480
      max_label_name_length: 10240
      max_label_names_per_series: 300
zzswang commented 1 year ago

@dmitry-mightydevops

I came along with same issue, do you have fixed it? Could you share your config? Thanks.