nokia / ansible-networking-collections

BSD 3-Clause "New" or "Revised" License
39 stars 17 forks source link

cli_config reply message doesn't contain INFO messages #4

Open cyrilleo opened 4 years ago

cyrilleo commented 4 years ago

Hi, Cli_config is better than the previous sros_config module as now, we can have a feedback message for the router. It’s very useful to manage error messages: For instance, trying to configure a service customer’s MSS with an wrong lag number: The register variable of the cli_config task contains the 2 following important variable:

After some tests, we observed that MINOR, MAJOR and CRITICAL error messages were correctly send back from the router. But unfortunately, that is not the case for INFO messages ! :( For instance, trying to add a bgp neighbor in a bgp group when this bgp neighbor is already configured in another bgp group: 1/ We have an INFO message in direct CLI: *A:LABROUTER# configure router bgp group "GRP-IBGPV6-RR" neighbor 10.10.127.1 INFO: BGP #1001 Configuration failed because of inconsistent values - Peer 10.10.127.1 already belongs to group "GRP-IBGP-RR" (VR 1): cannot be changed!

Even this is an INFO message, from our point of view, it a real config error.

2/ Doing the same thing with Ansible and cli_config:

The output2 register variable does not contains any “msg” content, and the “failed” is set to false :( It’s not good for error management. Rgds,

hellt commented 4 years ago

Hi @cyrilleo , thanks for reporting this, indeed the INFO level messages are not considered to be an error in our code. We will look into ways to solve this shortly.

@wisotzky I propose we add a new element to the list of stderr markers (example):

[\r\n]INFO.*Configuration failed .*[\r\n]+

to handle cases like that. What is your opinion? I can send a PR if you are ok with that proposal.

wisotzky commented 4 years ago

Hi @hellt, From my understanding "INFO" or "WARNING" should not be used if something is failing. It might be used as an indication that a change/operation was accepted, but does likely not have the expected result for some reason. I would ask @jgcumming to dump his opinion on the case above. Filtering not just by the severity level but also for the content of the message to me is not a good approach at all. This easily leads to a never-ending debate where everybody has an opinion on what patterns need to be treated like errors. Also not sure, how big the risk would be, that error texts get updated release over release.

hellt commented 4 years ago

I fail to see any other viable solution to this problem. And this IS a problem, since we report a failed configuration action as a non-failed task. It's not a false positive, it's a negative behavior of the plugin.

We need to consider the fact that even if this gets fixed to MINOR level in the future releases it will still behave wrongly for the thousands of devices running the current and older sw versions. Today we discovered a single case for such a behavior, there could be more.

I would disagree that the proposed match expression can lead to a potential dispute, it is a mitigation of a specific situation where an info level message contains a clean indication of a failed configuration. I'd even claim that it is safer to raise an error in a task that has succeeded, rather than declare something that failed as passed. With this being said I don't think the potential risk of a changed wording for the cases like that defeats the proposed fix. We can look into the issues that might come once and if they come.

wisotzky commented 4 years ago

@hellt, I do not disagree with what you are saying - but would avoid to open a never-ending discussion to have more exceptions added. A viable solution might be, to consider all WARNING and INFO responses as fault. But for this it would be good to confirm with @jgcumming if this would not cause any bigger issues - aka is there regular situations where INFO is raised that can not be avoided.

Side note: In Nokia NSP WFM the problem does not exists, as we've decided allow overriding the error regexp in the workflow if needed. That way, one can decide to ignore or fail on errors as needed.

cyrilleo commented 4 years ago

Hi, Considering an ansible module, as a network operator, the main important behaviour to fix is the missing message (= no "msg" variable" in the registered output) in case of INFO messages. Then, choosing the setting value of "failed" variable is another discussion. First of all, we could ask why a config error such as the given BGP group example, is raised as an INFO message and not a MINOR one...

hellt commented 4 years ago

@cyrilleo check what you have in stdout of

debug:
var: output2

I think you will find the INFO message there.

The reason there is no msg I guess is because it was not captured as an error, therefore msg is empty

cyrilleo commented 4 years ago

@hellt, no, I confirm we can't find the INFO message in the output2: ok: [LABROUTER] => { "output2": { "ansible_facts": { "discovered_interpreter_python": "/usr/bin/python" }, "changed": false, "failed": false, "warnings": [ "Platform linux on host LABROUTER is using the discovered Python interpreter at /usr/bin/python, but future installation of another Python interpreter could change this. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more information." ] } }

wisotzky commented 2 years ago

@hellt, @cyrilleo, it would be great to have your view on how to take this forward... In the case of cli_command messages of severity INFO are treated like every other output and those become consumable by the playbook. It looks the netcommon cli_config however works differently. There might be ways to pass through the message, but this requires deeper investigation. Treating INFO like an error is clearly not recommended, because typically it is not seen as error message but rather informational Information to the operator.

hellt commented 2 years ago

in my opinion this INFO message doesn't bear the informational meaning, since it encloses an actual config error I would still use a regext for that particular INFO with error string to be marked as an error

wisotzky commented 2 years ago

@hellt, I would not mind doing it the way you've described... However, it would be great to have some proof-points that a regular expression built around an INFO message would capture the majority when not all of similar cases:

To remind, the error was looking like this: INFO: BGP #1001 Configuration failed because of inconsistent values - Peer 10.10.127.1 already belongs to group "GRP-IBGP-RR" (VR 1): cannot be changed!

I think your proposal was to check for "INFO[...]Configuration failed"... Maybe let's check if we find someone in the SROS team, that can grep for all potential INFO messages that can be raised by the router in one of the latest releases. That should help us to find a reasonable regular expression that covers those cases.

As I've said before, I would try to avoid getting those cases being added per individual occurrence, because this would not be generic with never ending changes required even for future releases.

I still believe we should work with SROS team to ensure that real errors are never reported with INFO severity.

wisotzky commented 2 years ago

@hellt, I've checked with Nokia SROS team... It looks the catch above is an exception. My initial readout was correct, aka errors should never raised with "INFO" severity level. If people find issues like that, they should raise bug reports to get it fixed. While the error mentioned here is about classic CLI, and we don't have full visibility who are all the customers who are impacted by this (either today or in the future) even with older releases deployed, it seems to be reasonable to add the specific condition (regular expression) to the code, to pass this as an error.