influxdata / telegraf

Agent for collecting, processing, aggregating, and writing metrics, logs, and other arbitrary data.
https://influxdata.com/telegraf
MIT License
14.89k stars 5.6k forks source link

Segfault on armv7l #7275

Closed staticfloat closed 4 years ago

staticfloat commented 4 years ago

I have telegraf v1.10.4 running inside of a docker container on an armv7l board, and it reliably segfaults after a few minutes of collecting stats.

Relevant telegraf.conf:

[global_tags]                                                                                                                                                                         
  project= "julia"                                                                                                                                                                 
[agent]                                                                                                                                                                               
  interval = "10s"                                                                                                                                                                    
  round_interval = true                                                                                                                                                               
  metric_batch_size = 1000                                                                                                                                                            
  metric_buffer_limit = 10000                                                                                                                                                         
  collection_jitter = "0s"                                                                                                                                                            
  flush_interval = "60s"                                                                                                                                                              
  flush_jitter = "10s"                                                                                                                                                                
  precision = ""                                                                                                                                                                      
  hostname = "firefly1"                                                                                                                                                              
  omit_hostname = false                                                                                                                                                               
[[outputs.influxdb]]                                                                                                                                                                  
  urls = ["http://[fd37:5040::dc82:d3f5:c8b7:c381]:8086"]                                                                                                                             
  content_encoding = "gzip"                                                                                                                                                           
[[inputs.cpu]]                                                                                                                                                                        
  percpu = true                                                                                                                                                                       
  totalcpu = true                                                                                                                                                                     
  collect_cpu_time = true                                                                                                                                                             
  report_active = true                                                                                                                                                                
[[inputs.disk]]                                                                                                                                                                       
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]                                                                                                
[[inputs.diskio]]                                                                                                                                                                     
[[inputs.kernel]]                                                                                                                                                                     
[[inputs.mem]]                                                                                                                                                                        
[[inputs.processes]]                                                                                                                                                                  
[[inputs.swap]]                                                                                                                                                                       
[[inputs.system]]                                                                                                                                                                     
  fielddrop = ["uptime_format"]                                                                                                                                                       
[[inputs.docker]]                                                                                                                                                                     
  endpoint = "unix:///var/run/docker.sock"                                                                                                                                            
  total = true                                                                                                                                                                        
[[inputs.net]]                                                                                                                                                                        
[[inputs.sensors]]                                                                                                                                                                    
[[inputs.temp]]

System info:

Telegraf version: v1.10.4 OS: Debian 9 Hardware: Firefly RK3328-CC Kernel: v4.4.114

Docker

The docker-compose setup is available here: https://github.com/staticfloat/julia-docker/tree/master/telegraf

Steps to reproduce:

I just run the docker container, and it eventually segfaults. Here is an example log file: https://gist.github.com/staticfloat/68ccde0c51274ff01bae3c3535992f07

I note that I have two other identical machines (firefly2 and firefly3) that do not exhibit this behavior, and all three are running identical software configurations. The strconv.bigFtoa in the stack trace leads me to believe that there's some kind of malformed float that is getting converted, perhaps, so this may be data/sensor-dependent.

danielnelson commented 4 years ago

Can you check with the 1.14.0 release to see if it still occurs?

staticfloat commented 4 years ago

Ah, I permuted the latest release in my head and thought I already have it! Confirmed that the latest release doesn't have this segfault. Thanks!