timescale / prometheus-postgresql-adapter

Use PostgreSQL as a remote storage database for Prometheus
Apache License 2.0
335 stars 66 forks source link

Adapter High memory usage - 20GB+ #108

Closed cyakimov closed 4 years ago

cyakimov commented 4 years ago

Hi guys, I'm having trouble with the Adapter memory usage (HEAP) going through the roof and I'm not sure why.

I've been using TimescaleDB timescaledb-single Kubernetes chart along with this Adapter to store metrics from a Federated Prometheus deployment. PODs are getting killed & evicted because of eventual OOM.

I'm using the timescale timescaledev/timescaledb-ha:pg11-ts1.6 docker image. All this is deployed in GKE.

Adapter memory usage over time: Screen Shot 2020-03-15 at 3 38 45 PM

Prometheus remote config:

remote_write:
- url: http://prometheus-adapter.monitoring.svc.cluster.local:9201/write
  remote_timeout: 30s
  queue_config:
    capacity: 500
    max_shards: 1000
    min_shards: 1
    max_samples_per_send: 100
    batch_send_deadline: 5s
    min_backoff: 30ms
    max_backoff: 100ms
remote_read:
- url: http://prometheus-adapter.monitoring.svc.cluster.local:9201/read
  remote_timeout: 1m
Adapter Metrics

``` # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 2.1723e-05 go_gc_duration_seconds{quantile="0.25"} 7.0525e-05 go_gc_duration_seconds{quantile="0.5"} 9.7253e-05 go_gc_duration_seconds{quantile="0.75"} 0.000198735 go_gc_duration_seconds{quantile="1"} 0.014655399 go_gc_duration_seconds_sum 0.042773711 go_gc_duration_seconds_count 103 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 83658 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.12.9"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 1.3915895056e+10 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 1.07005120944e+11 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.543185e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 7.72830921e+08 # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started. # TYPE go_memstats_gc_cpu_fraction gauge go_memstats_gc_cpu_fraction 0.00522351603037884 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 6.593024e+08 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 1.3915895056e+10 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 7.97515776e+08 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 1.4423482368e+10 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 2.26815044e+08 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 0 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 1.5220998144e+10 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.5842969487112806e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 9.99645965e+08 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 13888 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 16384 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 2.63439936e+08 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 2.68140544e+08 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 2.7781525376e+10 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 5.4581959e+07 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 6.83802624e+08 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 6.83802624e+08 # HELP go_memstats_sys_bytes Number of bytes obtained from system. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 1.688838524e+10 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 19 # HELP http_request_duration_ms Duration of HTTP request in milliseconds # TYPE http_request_duration_ms histogram http_request_duration_ms_bucket{path="write",le="0.005"} 0 http_request_duration_ms_bucket{path="write",le="0.01"} 0 http_request_duration_ms_bucket{path="write",le="0.025"} 0 http_request_duration_ms_bucket{path="write",le="0.05"} 0 http_request_duration_ms_bucket{path="write",le="0.1"} 0 http_request_duration_ms_bucket{path="write",le="0.25"} 0 http_request_duration_ms_bucket{path="write",le="0.5"} 0 http_request_duration_ms_bucket{path="write",le="1"} 0 http_request_duration_ms_bucket{path="write",le="2.5"} 0 http_request_duration_ms_bucket{path="write",le="5"} 0 http_request_duration_ms_bucket{path="write",le="10"} 0 http_request_duration_ms_bucket{path="write",le="+Inf"} 67697 http_request_duration_ms_sum{path="write"} 1.2433095041e+11 http_request_duration_ms_count{path="write"} 67697 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1942.52 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 83247 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 1.6766275584e+10 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.58428825608e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 1.693741056e+10 # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE process_virtual_memory_max_bytes gauge process_virtual_memory_max_bytes -1 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 0 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0 # HELP received_samples_total Total number of received samples. # TYPE received_samples_total counter received_samples_total 1.0762885e+07 # HELP sent_batch_duration_seconds Duration of sample batch send calls to the remote storage. # TYPE sent_batch_duration_seconds histogram sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.005"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.01"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.025"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.05"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.1"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.25"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="0.5"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="1"} 0 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="2.5"} 1 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="5"} 13 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="10"} 278 sent_batch_duration_seconds_bucket{remote="PostgreSQL",le="+Inf"} 67697 sent_batch_duration_seconds_sum{remote="PostgreSQL"} 1.2433085211046816e+08 sent_batch_duration_seconds_count{remote="PostgreSQL"} 67697 # HELP sent_samples_total Total number of processed samples sent to remote storage. # TYPE sent_samples_total counter sent_samples_total{remote="PostgreSQL"} 4.843248e+06 ```

On the other hand, TimescaleDB looks rock solid from Ops perspective:

Screen Shot 2020-03-15 at 3 47 07 PM

I'd appreciate any help at this point!

cyakimov commented 4 years ago

Increasing the number of POD replicas removed the pressure from memory. Now usage is around 50mb per POD. Closing issue.