lausser / check_nwc_health

nwc = network component. This plugin checks lots of aspects of routers, switches, wlan controllers, firewalls,.....
http://labs.consol.de/nagios/check_nwc_health
GNU General Public License v2.0
150 stars 88 forks source link

Is check_nwc_health ePN broken? #339

Open c-kr opened 1 day ago

c-kr commented 1 day ago

Background:

Problem:

Steps to Reproduce:

  1. Build versions 2cd3c8d33adec16f19c01d0e1223d3a31408caa6 and 399dc41f652440fbee17683d0d8450c3acaa802a.

  2. Use a simple test environment (e.g., start an Ubuntu 20.04 Docker container) and run the following commands:

    apt install mod-gearman-tools mod-gearman-worker
    mkdir /usr/share/mod_gearman/ && ln -s /usr/share/mod-gearman/mod_gearman_p1.pl /usr/share/mod_gearman/mod_gearman_p1.pl

    This fixes the mod_gearman_p1.pl path.

  3. Use mod_gearman_mini_epn to test ePN for both built versions:

    • Error with 2cd3c8d33adec16f19c01d0e1223d3a31408caa6:

      root@aba4d9c17e58:~# mod_gearman_mini_epn /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health
      plugin return code: 3
      perl plugin output: '**ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: plugin did not call exit()
      **ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: "Can't locate object method "run_plugin" via package "ModGearmanP1.pl" (perhaps you forgot to load "ModGearmanP1.pl"?) at (eval 1) line 98870,".
    • Works with 399dc41f652440fbee17683d0d8450c3acaa802a:

      root@aba4d9c17e58:~# mod_gearman_mini_epn /data/check_nwc_health_399dc41f652440fbee17683d0d8450c3acaa802a/plugins-scripts/check_nwc_health
      plugin return code: 3
      perl plugin output: 'Usage: check_nwc_health [ -v|--verbose ] [ -t <timeout> ] --mode <what-to-do> --hostname <network-component> --community <snmp-community>  ...]

Question:

Is ePN usage via mod-gearman still supported or is our environment too outdated? Alternatively, could there be a bug introduced by the changes in commit 2cd3c8d33adec16f19c01d0e1223d3a31408caa6 that prevents ePN from functioning correctly?

codeautopilot[bot] commented 1 day ago

Potential solution

The plan to solve the bug involves addressing the dynamic method call issue and ensuring that all necessary modules are correctly loaded in the embedded Perl (ePN) environment. The error message indicates that the run_plugin method cannot be located, which suggests a problem with how the method is being called or how the modules are being loaded. By ensuring that the modules are available and the dynamic method call is correctly constructed, we can restore ePN functionality.

What is causing this bug?

The bug is likely caused by the dynamic method call to run_plugin in the check_nwc_health.pl script. The error message indicates that the method cannot be located, which suggests that the package or method is not correctly defined or accessible in the ePN environment. This issue may have been introduced by changes in commit 2cd3c8d33adec16f19c01d0e1223d3a31408caa6, which could have affected how modules are loaded or how dynamic method calls are handled.

Code

To address the issue, we need to ensure that the run_plugin method is correctly defined and accessible. We can modify the check_nwc_health.pl script to explicitly require the necessary modules and ensure that the dynamic method call is correctly constructed. Here is a potential code snippet to address the issue:

# Ensure required modules are loaded
use Monitoring::GLPlugin;
use Monitoring::GLPlugin::SNMP;
use Monitoring::GLPlugin::UPNP;

# Initialize the plugin object
my $plugin_class = 'Monitoring::GLPlugin::SNMP::CheckNwcHealth';
my $plugin = $plugin_class->new();

# Set modes and arguments
$plugin->set_modes(...);
$plugin->set_arguments(...);

# Execute the plugin
$plugin->run_plugin();

By explicitly requiring the necessary modules and directly calling the run_plugin method on the plugin object, we can ensure that the method is correctly defined and accessible.

How to replicate the bug

To replicate the bug, follow these steps:

  1. Build versions 2cd3c8d33adec16f19c01d0e1223d3a31408caa6 and 399dc41f652440fbee17683d0d8450c3acaa802a.

  2. Set up a test environment, such as an Ubuntu 20.04 Docker container.

  3. Install the necessary packages:

    apt install mod-gearman-tools mod-gearman-worker
    mkdir /usr/share/mod_gearman/ && ln -s /usr/share/mod_gearman/mod_gearman_p1.pl /usr/share/mod_gearman/mod_gearman_p1.pl
  4. Use mod_gearman_mini_epn to test ePN for both built versions:

    • For version 2cd3c8d33adec16f19c01d0e1223d3a31408caa6, you should see the error indicating that the run_plugin method cannot be located.
    • For version 399dc41f652440fbee17683d0d8450c3acaa802a, the plugin should work as expected.

By following these steps, you can replicate the bug and verify the solution.

Click here to create a Pull Request with the proposed solution

Files used for this task:

Changes on contrib/CheckNwcHealthStack.pm ## Analysis of `contrib/CheckNwcHealthStack.pm` ### Overview The file `CheckNwcHealthStack.pm` is a Perl module that appears to be part of a monitoring plugin for network devices, specifically dealing with SNMP (Simple Network Management Protocol) data related to network stacks. It defines several packages and classes that extend functionality from a base monitoring plugin framework. ### Key Components 1. **Inheritance and Structure**: - The module defines several packages (`MyStack`, `MyStack::StackSubsystem`, `MyStack::StackSubsystem::Entity`, and `MyStack::StackSubsystem::Module`). - These packages inherit from classes in the `Monitoring::GLPlugin::SNMP` framework, suggesting they are designed to work within this framework's structure. 2. **Methods and Functionality**: - The `init` methods in `MyStack` and `MyStack::StackSubsystem` set up SNMP objects and tables, indicating that these classes are responsible for initializing and managing SNMP data. - The `check` methods perform various checks on the SNMP data, adding status messages based on the results. - The `finish` methods in `MyStack::StackSubsystem::Entity` and `MyStack::StackSubsystem::Module` perform final processing on the data. 3. **SNMP Data Handling**: - The module interacts with SNMP MIBs (Management Information Bases), specifically `CISCO-STACK-MIB` and `ENTITY-MIB`, to gather and process data about network device components. ### Potential Issues Related to ePN - **Method Calls and Object Instantiation**: The error message in the issue description mentions a problem with locating the `run_plugin` method. This file does not directly reference such a method, suggesting the issue might not originate here. - **Module Imports and Dependencies**: The module relies on `Monitoring::GLPlugin::SNMP` and related classes. If there were changes in these dependencies that are incompatible with ePN, it could cause issues. - **Perl Syntax and Compatibility**: The module uses standard Perl syntax and constructs. However, if there are any constructs or modules that are not compatible with ePN, it could lead to execution problems. ### Conclusion The file `CheckNwcHealthStack.pm` does not directly show any changes or issues that would obviously break ePN functionality. The problem might be related to changes in the base framework (`Monitoring::GLPlugin::SNMP`) or other parts of the codebase not visible in this file. Further investigation into the changes made in commit `2cd3c8d33adec16f19c01d0e1223d3a31408caa6` and other related files is necessary to pinpoint the exact cause of the ePN failure.
Changes on plugins-scripts/check_nwc_health.pl ## Analysis Report ### Overview The file `plugins-scripts/check_nwc_health.pl` is a Perl script that defines a package `CheckNwcHealth` and contains a `run_plugin` subroutine. This script is part of a Nagios plugin used to monitor various parameters of network components. The issue reported involves the embedded Perl (ePN) functionality breaking after a specific commit, while it works with a previous commit. ### Key Observations 1. **Module Imports**: The script attempts to load several modules (`Monitoring::GLPlugin`, `Monitoring::GLPlugin::SNMP`, `Monitoring::GLPlugin::UPNP`) using `eval`. If these modules are not found, it prints an error message and exits with code 3. 2. **Plugin Initialization**: The script initializes a plugin object using a dynamically determined class name (`$plugin_class`) and sets various modes and arguments for the plugin. 3. **Execution Flow**: The script processes command-line options, validates arguments, and performs checks using the plugin object. It then exits with a Nagios-compatible status code and message. 4. **Dynamic Execution**: The script ends with a dynamic call to `run_plugin` using a constructed package name, which might be problematic if the package or method is not correctly defined or accessible. ### Potential Causes of the Bug - **Module Loading**: If the required modules (`Monitoring::GLPlugin`, etc.) are not correctly loaded or available in the environment, it could cause the script to fail, especially under ePN, which has stricter requirements for module availability. - **Dynamic Method Call**: The final line dynamically constructs a package name and calls `run_plugin`. If this package or method is not correctly defined or accessible in the ePN environment, it could lead to the reported error. - **Changes in Commit**: The commit `2cd3c8d33adec16f19c01d0e1223d3a31408caa6` might have introduced changes that affect how modules are loaded or how dynamic method calls are handled, which could break ePN compatibility. ### Recommendations 1. **Verify Module Availability**: Ensure that all required modules are available and correctly loaded in the ePN environment. This might involve setting the `PERL5LIB` environment variable or including modules in the script. 2. **Review Dynamic Execution**: Check the logic for constructing and calling the `run_plugin` method dynamically. Ensure that the package and method are correctly defined and accessible in the ePN context. 3. **Compare Commits**: Review the changes introduced in the problematic commit to identify any modifications related to module loading or dynamic execution that could affect ePN functionality. 4. **Testing**: Test the script in a controlled environment with ePN enabled to reproduce and diagnose the issue further. By addressing these areas, it may be possible to restore ePN compatibility and resolve the reported issue.
lausser commented 20 hours ago

Hi, check_nwc_health is supposed to run in an OMD environment. At our customer sites we have thouands of checks which are running without problem. Actually the commit you mentioned was made in such a high-volume installation and ePN-errors did only occur on rare occasions then. Did you rename the plugin? It has to be started from a file named check_nwc_health. Gerhard


From: Christopher Kreft @.> Sent: Sunday, October 13, 2024 9:55 AM To: lausser/check_nwc_health @.> Cc: Subscribed @.***> Subject: [lausser/check_nwc_health] Is check_nwc_health ePN broken? (Issue #339)

Background:

Problem:

Steps to Reproduce:

  1. Build versions https://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 and https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a 399dc41https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a .

  1. Use a simple test environment (e.g., start an Ubuntu 20.04 Docker container) and run the following commands:

apt install mod-gearman-tools mod-gearman-worker mkdir /usr/share/mod_gearman/ && ln -s /usr/share/mod-gearman/mod_gearman_p1.pl /usr/share/mod_gearman/mod_gearman_p1.pl

This fixes the mod_gearman_p1.pl path.

  1. Use mod_gearman_mini_epn to test ePN for both built versions:

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 :

@.*:~# mod_gearman_mini_epn /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health plugin return code: 3 perl plugin output: 'ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: plugin did not call exit() **ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: "Can't locate object method "run_plugin" via package "ModGearmanP1.pl" (perhaps you forgot to load "ModGearmanP1.pl"?) at (eval 1) line 98870,".

 *   Works with <https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a>

399dc41https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a :

@.***:~# mod_gearman_mini_epn /data/check_nwc_health_399dc41f652440fbee17683d0d8450c3acaa802a/plugins-scripts/check_nwc_health plugin return code: 3 perl plugin output: 'Usage: check_nwc_health [ -v|--verbose ] [ -t ] --mode --hostname --community ...]

Question:

Is ePN usage via mod-gearman still supported or is our environment too outdated? Alternatively, could there be a bug introduced by the changes in commit https://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 that prevents ePN from functioning correctly?

— Reply to this email directly, view it on GitHubhttps://github.com/lausser/check_nwc_health/issues/339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AABQSOGXL6N6FN3FKCDK4ALZ3IRPVAVCNFSM6AAAAABP3F5642VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DGNZWGE4TSNY. You are receiving this because you are subscribed to this thread.


From: Christopher Kreft @.> Sent: Sunday, October 13, 2024 9:55 AM To: lausser/check_nwc_health @.> Cc: Subscribed @.***> Subject: [lausser/check_nwc_health] Is check_nwc_health ePN broken? (Issue #339)

Background:

Problem:

Steps to Reproduce:

  1. Build versions https://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 and https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a 399dc41https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a .

  1. Use a simple test environment (e.g., start an Ubuntu 20.04 Docker container) and run the following commands:

apt install mod-gearman-tools mod-gearman-worker mkdir /usr/share/mod_gearman/ && ln -s /usr/share/mod-gearman/mod_gearman_p1.pl /usr/share/mod_gearman/mod_gearman_p1.pl

This fixes the mod_gearman_p1.pl path.

  1. Use mod_gearman_mini_epn to test ePN for both built versions:

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 :

@.*:~# mod_gearman_mini_epn /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health plugin return code: 3 perl plugin output: 'ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: plugin did not call exit() **ePN /data/check_nwc_health_2cd3c8d33adec16f19c01d0e1223d3a31408caa6/plugins-scripts/check_nwc_health: "Can't locate object method "run_plugin" via package "ModGearmanP1.pl" (perhaps you forgot to load "ModGearmanP1.pl"?) at (eval 1) line 98870,".

 *   Works with <https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a>

399dc41https://github.com/lausser/check_nwc_health/commit/399dc41f652440fbee17683d0d8450c3acaa802a :

@.***:~# mod_gearman_mini_epn /data/check_nwc_health_399dc41f652440fbee17683d0d8450c3acaa802a/plugins-scripts/check_nwc_health plugin return code: 3 perl plugin output: 'Usage: check_nwc_health [ -v|--verbose ] [ -t ] --mode --hostname --community ...]

Question:

Is ePN usage via mod-gearman still supported or is our environment too outdated? Alternatively, could there be a bug introduced by the changes in commit https://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6

2cd3c8dhttps://github.com/lausser/check_nwc_health/commit/2cd3c8d33adec16f19c01d0e1223d3a31408caa6 that prevents ePN from functioning correctly?

— Reply to this email directly, view it on GitHubhttps://github.com/lausser/check_nwc_health/issues/339, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AABQSOGXL6N6FN3FKCDK4ALZ3IRPVAVCNFSM6AAAAABP3F5642VHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4DGNZWGE4TSNY. You are receiving this because you are subscribed to this thread.

c-kr commented 14 hours ago

Hi Gerhard,

thanks for your feedback first.

Actually the commit you mentioned was made in such a high-volume installation and ePN-errors did only occur on rare occasions then

Thats really strange. We can reproduce it every time with a single call without any arguments, as mentioned in my post above (see exact steps to reproduce). If you cant reproduce it maybe your environment differs. Do you use OMD with mod_gearman or do you use embedded perl directly inside the monitoring core? Or maybe you use the go worker instead of the deprecated one? Could this be the issue? Maybe the implementation of ePN differs between mod_gearman C, mod_gearman go and directly inside the monitoring core as they use different p1.pl files?

Did you rename the plugin?

No, i just renamed the folders for reference but did not rename the plugin (... /plugins-scripts/check_nwc_health)

lausser commented 9 hours ago

Mod-Gearman (Go, the old C-worker is completely outdated and yes, the ePN implementations are different. In the moment the Go-worker was released, we deprecated all the old workers in all our installations)