Open tjyang opened 6 years ago
I can confirm this issue still exists and seems to be on the Adagios side:
Centos 7.4 (and Centos 7.5) adagios-1.6.3-1 nagios-4.3.4-5 check-mk-livestatus-1.2.8p26-1
Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/adagios/views.py", line 43, in wrapper result = view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/adagios/status/views.py", line 971, in downtime_list c['downtimes'] = l.query('GET downtimes', *args) File "/usr/lib/python2.7/site-packages/pynag/Parsers/multisite.py", line 80, in query query_result = backend_instance.query(query, *args, **kwargs) File "/usr/lib/python2.7/site-packages/pynag/Parsers/livestatus.py", line 996, in query raise InvalidResponseFromLivestatus(query=livestatus_query, response=response_data) InvalidResponseFromLivestatus: Could not parse response from livestatus. Query:GET downtimes ResponseHeader: fixed16 OutputFormat: python ColumnHeaders:
on``
livestatus is indeed loaded and working, I can verify it with the following:
echo 'GET hosts' | unixcat /var/spool/nagios/cmd/livestatus
and also via the following in the logs:
livestatus: Livestatus 1.2.8p26 by Mathias Kettner. Socket: '/var/spool/nagios/cmd/livestatus' livestatus: Please visit us at http://mathias-kettner.de/ livestatus: Hint: please try out OMD - the Open Monitoring Distribution livestatus: Please visit OMD at http://omdistro.org livestatus: Finished initialization. Further log messages go to /var/log/nagios/livestatus.log Event broker module '/usr/lib64/check_mk/livestatus.o' initialized successfully.
thank you @tjyang and @Mjolinir what version of Pynag are you using?
Hello gardart! I hope this is something that can be fixed relatively easy. It has been broken for some time.
For me it looks to be: pynag-0.9.1-1
Please let me know anything else I can do to help
@Mjolinir and @tjyang could you try to update to the latest pynag and adagios (released last week), using yum --enablerepo=ok-testing update pynag adagios let me know if this solves this issue
@Mjolinir , I updated the new rpms on my test nagios instance, it didn't help. Can you confirm ?
After "yum --enablerepo=ok-testing update pynag adagios"
[me@nagios03 ~]$ rpm -qa |egrep 'adagio|pynag'
pynag-0.9.1-1.git.187.9bcf9ed.el7.noarch
adagios-1.6.3-2.git.0.4290a53.el7.noarch
[me@ilclnagios03 ~]$
Oh no, something went wrong ☹
InvalidResponseFromLivestatus: Could not parse response from livestatus. Query:GET downtimes ResponseHeader: fixed16 OutputFormat: python ColumnHeaders: on Response:
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/adagios/views.py", line 43, in wrapper
result = view_func(request, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/adagios/status/views.py", line 971, in downtime_list
c['downtimes'] = l.query('GET downtimes', *args)
File "/usr/lib/python2.7/site-packages/pynag/Parsers/multisite.py", line 80, in query
query_result = backend_instance.query(query, *args, **kwargs)
File "/usr/lib/python2.7/site-packages/pynag/Parsers/livestatus.py", line 996, in query
raise InvalidResponseFromLivestatus(query=livestatus_query, response=response_data)
InvalidResponseFromLivestatus: Could not parse response from livestatus.
Query:GET downtimes
ResponseHeader: fixed16
OutputFormat: python
ColumnHeaders: on
@tjyang could you try to add this to your livestatus broker in /etc/nagios/nagios.cfg debug=1 query_timeout=0
Comments and Downtime under REPORTS section both have same issue.
config changed.
[root@nagios03 nagios]# egrep ^broker_module=/usr/lib64/check_mk/livestatus.o /etc/nagios/nagios.cfg
broker_module=/usr/lib64/check_mk/livestatus.o /var/spool/nagios/cmd/livestatus idle_timeout=12000 num_client_threads=20 debug=1 query_timeout=0
[root@nagios03 nagios]#
* /var/log/nagios/livestatus.log
[root@inagios03 nagios]# tail -40 /var/log/nagios/livestatus.log 2018-05-23 10:20:55 Query: ResponseHeader: fixed16 2018-05-23 10:20:55 Time to process request: 12 us. Size of answer: 36 bytes 2018-05-23 10:20:56 Query: GET hosts 2018-05-23 10:20:56 Query: Stats: state >= 0 2018-05-23 10:20:56 Query: Stats: state > 0 2018-05-23 10:20:56 Query: Stats: scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: hard_state >= 1 2018-05-23 10:20:56 Query: StatsAnd: 3 2018-05-23 10:20:56 Query: Stats: state > 0 2018-05-23 10:20:56 Query: Stats: scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: acknowledged = 0 2018-05-23 10:20:56 Query: Stats: hard_state >= 1 2018-05-23 10:20:56 Query: StatsAnd: 4 2018-05-23 10:20:56 Query: Filter: custom_variable_names < _REALNAME 2018-05-23 10:20:56 Query: Localtime: 1527085256 2018-05-23 10:20:56 Query: OutputFormat: python 2018-05-23 10:20:56 Query: KeepAlive: on 2018-05-23 10:20:56 Query: ResponseHeader: fixed16 2018-05-23 10:20:56 Time to process request: 856 us. Size of answer: 13 bytes 2018-05-23 10:20:56 Query: GET services 2018-05-23 10:20:56 Query: Stats: state >= 0 2018-05-23 10:20:56 Query: Stats: state > 0 2018-05-23 10:20:56 Query: Stats: scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: host_scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: host_state = 0 2018-05-23 10:20:56 Query: Stats: last_hard_state >= 1 2018-05-23 10:20:56 Query: StatsAnd: 5 2018-05-23 10:20:56 Query: Stats: state > 0 2018-05-23 10:20:56 Query: Stats: scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: host_scheduled_downtime_depth = 0 2018-05-23 10:20:56 Query: Stats: acknowledged = 0 2018-05-23 10:20:56 Query: Stats: host_state = 0 2018-05-23 10:20:56 Query: Stats: last_hard_state >= 1 2018-05-23 10:20:56 Query: StatsAnd: 6 2018-05-23 10:20:56 Query: Filter: host_custom_variable_names < _REALNAME 2018-05-23 10:20:56 Query: Localtime: 1527085256 2018-05-23 10:20:56 Query: OutputFormat: python 2018-05-23 10:20:56 Query: KeepAlive: on 2018-05-23 10:20:56 Query: ResponseHeader: fixed16 2018-05-23 10:20:56 Time to process request: 7114 us. Size of answer: 18 bytes [root@nagios03 nagios]#
Looks very similar for me:
Updated to the new packages from ok-testing. Problem still exists, unfortunately.
Debug:
Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/adagios/views.py", line 43, in wrapper result = view_func(request, *args, **kwargs) File "/usr/lib/python2.7/site-packages/adagios/status/views.py", line 959, in comment_list c['comments'] = l.query('GET comments', *args) File "/usr/lib/python2.7/site-packages/pynag/Parsers/multisite.py", line 80, in query query_result = backend_instance.query(query, *args, **kwargs) File "/usr/lib/python2.7/site-packages/pynag/Parsers/livestatus.py", line 996, in query raise InvalidResponseFromLivestatus(query=livestatus_query, response=response_data) InvalidResponseFromLivestatus: Could not parse response from livestatus. Query:GET comments ResponseHeader: fixed16 OutputFormat: python ColumnHeaders: on
Error msg:
`InvalidResponseFromLivestatus: Could not parse response from livestatus. Query:GET downtimes ResponseHeader: fixed16 OutputFormat: python ColumnHeaders: on Response: [[u"author",u"comment",u"duration",u"end_time",u"entry_time",u"fixed",u"host_accept_passive_checks",u"host_acknowledged",u"host_acknowledgement_type",u"host_action_url",u"host_action_url_expanded",u"host_active_checks_enabled",u"host_address",u"host_alias",u"host_check_command",u"host_check_command_expanded",u"host_check_flapping_recovery_notification",u"host_check_freshness",u"host_check_interval",u"host_check_options",u"host_check_period",u"host_check_type",u"host_checks_enabled",u"host_childs",u"host_comments",u"host_comments_with_extra_info",u"host_comments_with_info",u"host_contact_groups",u"host_contacts",u"host_current_attempt",u"host_current_notification_number",u"host_custom_variable_names",u"host_custom_variable_values",u"host_custom_variables",u"host_display_name",u"host_downtimes",u"host_downtimes_with_info",u"host_event_handler",u"host_event_handler_enabled",u"host_execution_time",u"host_filename",u"host_first_notification_delay",u"host_flap_detection_enabled",u"host_groups",u"host_hard_state",u"host_has_been_checked",u"host_high_flap_threshold",u"host_icon_image",u"host_icon_image_alt",u"host_icon_image_expanded",u"host_in_check_period",u"host_in_notification_period",u"host_in_service_period",u"host_initial_state",u"host_is_executing",u"host_is_flapping",u"host_last_check",u"host_last_hard_state",u"host_last_hard_state_change",u"host_last_notification",u"host_last_state",u"host_last_state_change",u"host_last_time_down",u"host_last_time_unreachable",u"host_last_time_up",u"host_latency",u"host_long_plugin_output",u"host_low_flap_threshold",u"host_max_check_attempts",u"host_metrics",u"host_mk_inventory",u"host_mk_inventory_gz",u"host_mk_inventory_last",u"host_modified_attributes",u"host_modified_attributes_list",u"host_name",u"host_next_check",u"host_next_notification",u"host_no_more_notifications",u"host_notes",u"host_notes_expanded",u"host_notes_url",u"host_notes_url_expanded",u"host_notification_interval",u"host_notification_period",u"host_notifications_enabled",u"host_num_services",u"host_num_services_crit",u"host_num_services_hard_crit",u"host_num_services_hard_ok",u"host_num_services_hard_unknown",u"host_num_services_hard_warn",u"host_num_services_ok",u"host_num_services_pending",u"host_num_services_unknown",u"host_num_services_warn",u"host_obsess_over_host",u"host_parents",u"host_pending_flex_downtime",u"host_percent_state_change",u"host_perf_data",u"host_plugin_output",u"host_pnpgraph_present",u"host_process_performance_data",u"host_retry_interval",u"host_scheduled_downtime_depth",u"host_service_period",u"host_services",u"host_services_with_fullstate",u"host_services_with_info",u"host_services_with_state",u"host_staleness",u"host_state",u"host_state_type",u"host_statusmap_image",u"host_total_services",u"host_worst_service_hard_state",u"host_worst_service_state",u"host_x_3d",u"host_y_3d",u"host_z_3d",u"id",u"is_service",u"service_accept_passive_checks",u"service_acknowledged",u"service_acknowledgement_type",u"service_action_url",u"service_action_url_expanded",u"service_active_checks_enabled",u"service_cache_interval",u"service_cached_at",u"service_check_command",u"service_check_command_expanded",u"service_check_freshness",u"service_check_interval",u"service_check_options",u"service_check_period",u"service_check_type",u"service_checks_enabled",u"service_comments",u"service_comments_with_extra_info",u"service_comments_with_info",u"service_contact_groups",u"service_contacts",u"service_current_attempt",u"service_current_notification_number",u"service_custom_variable_names",u"service_custom_variable_values",u"service_custom_variables",u"service_description",u"service_display_name",u"service_downtimes",u"service_downtimes_with_info",u"service_event_handler",u"service_event_handler_enabled",u"service_execution_time",u"service_first_notification_delay",u"service_flap_detection_enabled",u"service_groups",u"service_has_been_checked",u"service_high_flap_threshold",u"service_icon_image",u"service_icon_image_alt",u"service_icon_image_expanded",u"service_in_check_period",u"service_in_notification_period",u"service_in_service_period",u"service_initial_state",u"service_is_executing",u"service_is_flapping",u"service_last_check",u"service_last_hard_state",u"service_last_hard_state_change",u"service_last_notification",u"service_last_state",u"service_last_state_change",u"service_last_time_critical",u"service_last_time_ok",u"service_last_time_unknown",u"service_last_time_warning",u"service_latency",u"service_long_plugin_output",u"service_low_flap_threshold",u"service_max_check_attempts",u"service_metrics",u"service_modified_attributes",u"service_modified_attributes_list",u"service_next_check",u"service_next_notification",u"service_no_more_notifications",u"service_notes",u"service_notes_expanded",u"service_notes_url",u"service_notes_url_expanded",u"service_notification_interval",u"service_notification_period",u"service_notifications_enabled",u"service_obsess_over_service",u"service_percent_state_change",u"service_perf_data",u"service_plugin_output",u"service_pnpgraph_present",u"service_process_performance_data",u"service_retry_interval",u"service_scheduled_downtime_depth",u"service_service_period",u"service_staleness",u"service_state",u"service_state_type",u"start_time",u"triggered_by",u"type"]
....
1527163154,0,0,u"",u"",u"",u"",6.0000000000e+01,u"24x7_except_maintenance",1,0,0,0,0,0,0,0,0,0,0,1,[],0,0.0000000000e+00,u"",u"(Host check timed out after 30.10 seconds)",-1,1,1.0000000000e+00,1,u"",[],[],[],[],1.1466666667e+00,1,1,u"",0,0,0,0.0000000000e+00,0.0000000000e+00,0.0000000000e+00,177,0,0,0,0,u"",u"",0,0,0,u"",u"",0,0.0000000000e+00,0,u"",0,0,[],[],[],[],[],0,0,[],[],{},u"",u"",[],[],u"",0,0.0000000000e+00,0.0000000000e+00,0,[],0,0.0000000000e+00,u"",u"",u"",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0000000000e+00,u"",0.0000000000e+00,0,,0,[],0,0,0,u"",u"",u"",u"",0.0000000000e+00,u"",0,0,0.0000000000e+00,u"",u"",0,0,0.0000000000e+00,0,u"",0.0000000000e+00,0,0,1516214153,0,2]] `
tail -50 /var/log/nagios/livestatus.log 2018-05-24 07:59:25 Query: GET hosts 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: on 2018-05-24 07:59:25 Time to process request: 6587 us. Size of answer: 164445 bytes 2018-05-24 07:59:25 Time to process request: 5982 us. Size of answer: 164445 bytes 2018-05-24 07:59:25 Query: GET services 2018-05-24 07:59:25 Query: Filter: state != 0 2018-05-24 07:59:25 Query: Filter: acknowledged = 0 2018-05-24 07:59:25 Query: Filter: host_acknowledged = 0 2018-05-24 07:59:25 Query: Filter: scheduled_downtime_depth = 0 2018-05-24 07:59:25 Query: Filter: host_scheduled_downtime_depth = 0 2018-05-24 07:59:25 Query: Stats: state != 0 2018-05-24 07:59:25 Query: Stats: host_state != 0 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: off 2018-05-24 07:59:25 Time to process request: 37 us. Size of answer: 8 bytes 2018-05-24 07:59:25 Query: GET services 2018-05-24 07:59:25 Query: Stats: state != 0 2018-05-24 07:59:25 Query: Stats: state != 0 2018-05-24 07:59:25 Query: Stats: acknowledged = 0 2018-05-24 07:59:25 Query: Stats: scheduled_downtime_depth = 0 2018-05-24 07:59:25 Query: Stats: host_state = 0 2018-05-24 07:59:25 Query: StatsAnd: 4 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: off 2018-05-24 07:59:25 Time to process request: 57 us. Size of answer: 8 bytes 2018-05-24 07:59:25 Query: GET hosts 2018-05-24 07:59:25 Query: Stats: state != 0 2018-05-24 07:59:25 Query: Stats: state != 0 2018-05-24 07:59:25 Query: Stats: acknowledged = 0 2018-05-24 07:59:25 Query: Stats: scheduled_downtime_depth = 0 2018-05-24 07:59:25 Query: Stats: host_state = 1 2018-05-24 07:59:25 Query: StatsAnd: 4 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: off 2018-05-24 07:59:25 Time to process request: 56 us. Size of answer: 8 bytes 2018-05-24 07:59:25 Query: GET hosts 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: on 2018-05-24 07:59:25 Time to process request: 6385 us. Size of answer: 164445 bytes 2018-05-24 07:59:25 Query: GET hosts 2018-05-24 07:59:25 Query: ResponseHeader: fixed16 2018-05-24 07:59:25 Query: OutputFormat: python 2018-05-24 07:59:25 Query: ColumnHeaders: on 2018-05-24 07:59:25 Time to process request: 6306 us. Size of answer: 164445 bytes
One thing I noticed, not sure if it is relevant,
Im using check-mk-livestatus-1.2.8p26-1.el7 from EPEL. If I use mk-livestatus-1.2.2-3.git.2.27fc0fd.el7.centos.x86_64 from ok-testing then livestatus does not work at all.
[root@nagios03 ~]# rpm -qi check-mk-livestatus-1.2.8p26-1.el7.x86_64
Name : check-mk-livestatus
Version : 1.2.8p26
Release : 1.el7
Architecture: x86_64
Install Date: Sat 27 Jan 2018 03:38:45 PM EST
Group : Applications/Internet
Size : 762663
License : GPLv2 and GPLv3
Signature : RSA/SHA256, Fri 06 Oct 2017 11:47:35 AM EDT, Key ID 6a2faea2352c64e5
Source RPM : check-mk-1.2.8p26-1.el7.src.rpm
Build Date : Fri 06 Oct 2017 11:27:00 AM EDT
Build Host : buildhw-09.phx2.fedoraproject.org
<snipped>
[root@nagios03 ~]#
check-mk-livestatus-1.2.8p26-1.el7 from EPEL is the correct one...
@gardart I am using check-mk-livestatus-1.2.8p26-1.el7 from EPEL , Comments and Downtime still has issue. Looks like the adagios side of parser code need to be adjusted.
does your nagios server crash when this happens? Do you need to restart nagios service every time?
No, both nagios and livestatus daemon weren't not crashed when this issue happened.
[root@nagios03 nagios]# tail -20f /var/log/nagios/livestatus.log
2018-07-11 16:32:19 Idle timeout of 12000 ms exceeded. Going to close connection.
2018-07-11 16:32:19 error: Client connection terminated while request still incomplete
2018-07-11 16:32:21 Idle timeout of 12000 ms exceeded. Going to close connection.
2018-07-11 16:32:21 error: Client connection terminated while request still incomplete
2018-07-11 16:32:41 Idle timeout of 12000 ms exceeded. Going to close connection.
2018-07-11 16:32:41 error: Client connection terminated while request still incomplete
2018-07-11 16:32:48 Idle timeout of 12000 ms exceeded. Going to close connection.
2018-07-11 16:32:48 error: Client connection terminated while request still incomplete
2018-07-11 20:01:04 deinitializing
2018-07-11 20:01:04 Waiting for main to terminate...
2018-07-11 20:01:04 Waiting for client threads to terminate...
2018-07-11 20:01:04 Logfile cache: flushing complete cache.
2018-07-12 00:01:03 deinitializing
2018-07-12 00:01:03 Waiting for main to terminate...
2018-07-12 00:01:05 Waiting for client threads to terminate...
2018-07-12 00:01:05 Logfile cache: flushing complete cache.
2018-07-12 04:01:04 deinitializing
2018-07-12 04:01:04 Waiting for main to terminate...
2018-07-12 04:01:06 Waiting for client threads to terminate...
2018-07-12 04:01:06 Logfile cache: flushing complete cache.
^C
[root@nagios03 nagios]# date
Thu Jul 12 07:01:13 EDT 2018
[root@nagios03 nagios]#
Same applies to me. no crashes.
I noticed today that both Comments and Downtime are working! Unfortunately I am not sure which update fixed it. Here are the current versions of related packages:
check-mk-livestatus-1.4.0p31-2.el7.x86_64 (last updated June 21) pynag-0.9.1-1.git.187.9bcf9ed.el7.noarch (last updated May 24) adagios-1.6.3-2.git.0.4290a53.el7.noarch (last updated May 24) nagios-4.3.4-5.el7.x86_64 (last updated Apr 16)
It seems likely it was the check-mk-livestatus update in June and I just didn't notice - the updates are automated with Ansible
@tjyang can you confirm on your end?
[me@nagios03 ~]$ rpm -qa |egrep 'check-mk-livestatus-1|pynag-0|adagios-1|nagios-4'
pynag-0.9.1-1.git.187.9bcf9ed.el7.noarch
adagios-1.6.3-2.git.0.4290a53.el7.noarch
nagios-4.3.4-3.el7.x86_64
check-mk-livestatus-1.2.8p26-1.el7.x86_64
[me@nagios03 ~]$
sudo yum update -y check-mk-livestatus
sudo systemctl restart nagios
Thanks to @Mjolinir's pointer and @gardart's help.
[me@nagios01 servers]$ rpm -qa |egrep 'check-mk-livestatus-1|pynag-0|adagios-1|nagios-4'
pynag-0.9.1-1.git.172.66b2afa.el7.centos.noarch
adagios-1.6.3-1.git.0.fe59eeb.el7.centos.noarch
nagios-4.1.1-2.el7.centos.x86_64
check-mk-livestatus-1.4.0p31-2.el7.x86_64
[me@nagios01 servers]$ cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
[me@nagios01 servers]$
I tried two different versions of mk-livestatus, 1.2.6 and 1.2.8. 1.2.6 still works with Nagios4 but 1.2.8 gives parse errors in downtime and comments view. mk-livestatus works best when using Naemon as the Nagios server. You can install Adagios on top of Naemon as well.
Here is the current workaround for Nagios4: You can build 1.2.6 with nagios4 like this
yum remove check-mk wget http://www.mathias-kettner.de/download/mk-livestatus-1.2.6.tar.gz yum install -y make gcc-c++ tar -zxvf mk-livestatus-1.2.6.tar.gz cd mk-livestatus-1.2.6 ./configure --with-nagios4 make make install
Then use this in your broker_module settings broker_module=/usr/local/lib/mk-livestatus/livestatus.o /var/spool/nagios/cmd/livestatus
[root@nagios03 ~]# cat /etc/redhat-release; rpm -qa |egrep 'check-mk-livestatus-1|pynag-0|adagios-1|nagios-4';date
CentOS Linux release 7.6.1810 (Core)
pynag-0.9.1-1.git.187.9bcf9ed.el7.noarch
adagios-1.6.3-2.git.0.4290a53.el7.noarch
check-mk-livestatus-1.4.0p31-2.el7.x86_64
nagios-4.4.3-1.el7.x86_64
Wed Sep 4 15:17:30 EDT 2019
[root@nagios03 ~]#
@gardart check-mk-livestatus-1.4.0p31-2.el7.x86_64 fixed my comment/downtime display issue but it will crash my nagios server due to livestatus aborted when doing LQL 'GET hosts' command. I tried compiling version from 1.2.8 up to latest 1.6 , they all crashed nagios server when doing GET hosts. so I followed your tip above, using version 1.2.6 and now both 'GET hosts' and "comment/downtime" all works. Thanks again for your pointer.
WHAT :
HOW: query on Aadagio downtime.
Environment info
The screenshot on my test nagios server
Traceback log