sni / lmd

Livestatus Multitool Daemon - Create livestatus federation from multiple sources
https://labs.consol.de/omd/packages/lmd/
GNU General Public License v3.0
42 stars 31 forks source link

Panic: runtime error: index out of range #105

Closed jgbuenaventura closed 3 years ago

jgbuenaventura commented 3 years ago

hi sven,

Several backend gets disconnected from time to time, and theres a delay when submitting a passive checks or adding a comments. Any thoughts on this?

Environment OS: RHel6 Thruk: 2.32-3 LMD: lmd - version 1.8.2 (Build: 1a9f3e3) Backend: Nagios and Icinga

lmd.ini StaleBackendTimeout = 3600 NetTimeout = 240 ConnectTimeout = 30 BackendKeepAlive = false FullUpdateInterval = 600

lmd.log [2020-11-26 13:10:04.146][Error][peer.go:2718] [Master] Panic: runtime error: index out of range [1] with length 1[2020-11-26 13:10:04.146][Error][peer.go:2719] [Master] Version: 1.8.2 (Build: 1a9f3e3) [2020-11-26 13:10:04.146][Error][peer.go:2720] [Master] goroutine 210285 [running]: runtime/debug.Stack(0x95aac0, 0xc0132543a0, 0x2) /usr/local/go/src/runtime/debug/stack.go:24 +0x9d main.logPanicExitPeer(0xc0001c4e00) /root/lmd/lmd/peer.go:2720 +0x39c panic(0x9e07e0, 0xc1015ab040) /usr/local/go/src/runtime/panic.go:969 +0x166 main.VirtColCustomVariables(0xc00d5613b0, 0xc000261db0, 0x985c00, 0xc03a7916e0) /root/lmd/lmd/datarow.go:570 +0x24a main.(DataRow).getVirtRowValue(0xc00d5613b0, 0xc000261db0, 0xc03a7916e0, 0xc00d56af78) /root/lmd/lmd/datarow.go:407 +0x21b main.(DataRow).GetHashMap(0xc00d5613b0, 0xc000261db0, 0xc01c014b00) /root/lmd/lmd/datarow.go:317 +0x4b main.(Filter).Match(0xc17d6ffc00, 0xc00d5613b0, 0xc046818100) /root/lmd/lmd/filter.go:485 +0x31e main.(DataRow).MatchFilter(0xc00d5613b0, 0xc17d6ffc00, 0x0) /root/lmd/lmd/datarow.go:657 +0x189 main.(DataRow).MatchFilter(0xc00d5613b0, 0xc17d6ffc80, 0xc054893d01) /root/lmd/lmd/datarow.go:618 +0xa5 main.(DataRow).MatchFilter(0xc00d5613b0, 0xc17d6ffd00, 0x1) /root/lmd/lmd/datarow.go:618 +0xa5 main.(Peer).gatherResultRows(0xc0001c4e00, 0xc127fa2a80, 0xc078d1e000, 0xc127fa2ae0) /root/lmd/lmd/peer.go:2576 +0x105 main.(Peer).BuildLocalResponseData(0xc0001c4e00, 0xc127fa2a80, 0xc078d1e000, 0xc127fa2ae0) /root/lmd/lmd/peer.go:2476 +0x251 main.(Response).BuildLocalResponse.func2(0xc127fa2a80, 0xc078d1e000, 0xc127fa2ae0, 0xc0001c4e00, 0xc0b94a84d0) /root/lmd/lmd/response.go:505 +0x19e created by main.(Response).BuildLocalResponse /root/lmd/lmd/response.go:498 +0x20b [2020-11-26 13:10:04.146][Error][peer.go:2722] [Master] LastQuery: [2020-11-26 13:10:04.147][Error][peer.go:2723] [Master] GET services ResponseHeader: fixed16 OutputFormat: json Columns: host_name description accept_passive_checks acknowledged acknowledgement_type active_checks_enabled check_freshness check_options check_type checks_enabled current_attempt current_notification_number custom_variable_values event_handler_enabled execution_time first_notification_delay flap_detection_enabled has_been_checked in_check_period in_notification_period is_executing is_flapping last_check last_hard_state last_hard_state_change last_notification last_state last_state_change last_time_critical last_time_warning last_time_ok last_time_unknown latency long_plugin_output low_flap_threshold modified_attributes modified_attributes_list next_check next_notification notifications_enabled obsess_over_service percent_state_change perf_data plugin_output process_performance_data scheduled_downtime_depth state state_type staleness pnpgraph_present check_source

sni commented 3 years ago

Please update to the latest HEAD and see if the panic still occurs. Regarding the delay, there is probably not much LMD can do about it. It's a tradeoff between constantly synchronizing everything and trying to sync only the objects which have changed. And so far there is no way to easily query all hosts/services which have change in nagios and icinga. That's why Naemon has a last_update attribute which can be easily queried to get all objects which have changed since the last sync. However, even with nagios or icinga it should not take longer than a minute till a passive result will be synced.