networktocode / ntc-templates

TextFSM templates for parsing show commands of network devices
https://ntc-templates.readthedocs.io/
Other
1.1k stars 721 forks source link

fortinet_get_system_ha_status.textfsm parse error for non "OK" HA Health Status #1690

Closed pnpestov closed 2 months ago

pnpestov commented 6 months ago
ISSUE TYPE
TEMPLATE USING

#

FG Version: 5.6, 6.0, 6.2, 6.4

HW : varied

# Value HA_HEALTH (\S+) Value MODEL (\S+) Value HA_MODE ([\S\s]+) Value HA_GROUP (\S+) Value CLUSTER_UPTIME ([\S\s]+) Value CLUSTER_STATE_CHANGED_TIME ([\S\s]+) Value HA_SESSION_PICKUP_STATUS (\S+) Value HA_SESSION_PICKUP_DELAY (\S+) Value HA_OVERRIDE_STATUS (\S+) Value HA_MASTER_UNIT_NAME (\S+) Value HA_SLAVE_UNIT_NAME (\S+) Value HA_MASTER_UNIT_SERIAL (\S+) Value HA_SLAVE_UNIT_SERIAL (\S+) Value HA_MASTER_UNIT_INDEX (\S+) Value HA_SLAVE_UNIT_INDEX (\S+)

Start ^HA\s+Health\s+Status:\s+${HA_HEALTH} ^Model:\s+${MODEL} ^Mode:\s+${HA_MODE} ^Group:\s+${HA_GROUP} ^Debug:\s+\d+ ^Cluster\s+Uptime:\s+${CLUSTER_UPTIME} ^Cluster\s+state\s+change\s+time:\s+${CLUSTER_STATE_CHANGED_TIME} ^(Master|Primary)\s+selected\s+using: ^\s*\<\S+ ^ses_pickup:\s+${HA_SESSION_PICKUP_STATUS},\s+ses_pickup_delay=${HA_SESSION_PICKUP_DELAY} ^override:\s+${HA_OVERRIDE_STATUS} ^Configuration\s+Status: -> Configuration_Status

Catch old 6.0_noha with no "Configuraton Status"

^System\s+Usage\s+stats: -> System_Usage_stats ^. -> Error "in-Start"

Configuration_Status ^System\s+Usage\s+stats: -> System_Usage_stats ^\s*\S+([\S\s]+):\s\S+$$ ^. -> Error "in-Configuration_Status"

System_Usage_stats ^HBDEV\s+stats: -> HBDEV_MONDEV_stats ^\s*\S+([\S\s]+):$$

^\s*\S+:\s+

^\s*sessions= ^. -> Error "in-System_Usage_stats"

HBDEV_MONDEV_stats

Combine stats, no MONDEV in older FW's

^\s\S+([\S\s]+):$$ ^\s\S+:\s.+rx.+tx.+$$ ^MONDEV\s+stats: ^(Master|Primary)\s:\s+${HA_MASTER_UNIT_NAME}\s,\s+${HA_MASTER_UNIT_SERIAL},\s+(HA\s+cluster\s+index|cluster\s+index)\s+=\s+${HA_MASTER_UNIT_INDEX} ^(Slave|Secondary)\s:\s+${HA_SLAVE_UNIT_NAME}\s,\s+${HA_SLAVE_UNIT_SERIAL},\s+(|HA)\scluster\s+index\s+=\s+${HA_SLAVE_UNIT_INDEX} ^number\s+of\s+vcluster:\s+\d+ ^vcluster\s+\d+: ^(Master|Slave|Primary|Secondary)\s:\s+\S+,\s+(operating\s+cluster\s+index|HA\s+operating\s+index)\s+=\s+\d+ -> Record ^\s*$$ ^. -> Error "in-HBDEV_MONDEV_stats"

SAMPLE COMMAND OUTPUT

HA Health Status: WARNING: FGT40FYYYYYYYYYY has mondev down; Model: FortiGate-40F Mode: HA A-P Group: 172 Debug: 0 Cluster Uptime: 63 days 22:15:42 Cluster state change time: 2024-02-11 15:25:27 Primary selected using: <2024/02/11 15:25:27> FGT40FXXXXXXXXXX is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGT40FYYYYYYYYYY. ses_pickup: enable, ses_pickup_delay=disable override: enable Configuration Status: FGT40FXXXXXXXXXX(updated 0 seconds ago): in-sync FGT40FYYYYYYYYYY(updated 0 seconds ago): in-sync System Usage stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): sessions=768, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=35% FGT40FYYYYYYYYYY(updated 0 seconds ago): sessions=634, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=31% HBDEV stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=9997131732/27616386/0/0, tx=10080077920/27616652/0/0 lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=11772621099/36693306/0/0, tx=26151306122/60128423/0/0 FGT40FYYYYYYYYYY(updated 0 seconds ago): lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=10080077920/27616652/0/0, tx=9997131732/27616386/0/0 lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=26151777728/60128423/0/0, tx=11771044717/36693306/0/0 MONDEV stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=535463275509/3388288017/0/0, tx=3023591767050/4114831127/0/0 wan: physical/100auto, up, rx-bytes/packets/dropped/errors=3314385262333/4439875482/0/0, tx=768352772861/3445252569/0/0 FGT40FYYYYYYYYYY(updated 0 seconds ago): lan1: physical/00, down, rx-bytes/packets/dropped/errors=0/0/0/0, tx=0/0/0/0 wan: physical/100auto, up, rx-bytes/packets/dropped/errors=15792718293/245544650/0/0, tx=0/0/0/0 Primary : FGT-fw-a, FGT40FXXXXXXXXXX, HA cluster index = 1 Secondary : FGT-fw-b, FGT40FYYYYYYYYYY, HA cluster index = 0 number of vcluster: 1 vcluster 1: work 169.254.0.2 Primary: FGT40FXXXXXXXXXX, HA operating index = 0 Secondary: FGT40FYYYYYYYYYY, HA operating index = 1

SUMMARY

Traceback (most recent call last): File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 16, in command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 61, in parse_output cli_table.ParseCmd(data, attrs) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd self.table = self._ParseCmdItem(self.raw, template_file=template_files[0]) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem for record in fsm.ParseText(cmd_input): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText self._CheckLine(line) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine if self._Operations(rule, line): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.' textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 36. Input Line: HA Health Status: .

STEPS TO REPRODUCE

Reproduce non "OK" HA Health Status in two lines. For example, disable the lan1 (HA Monitor Interface) work link on the slave node.

EXPECTED RESULTS

Get the current value of HA Health Status parsed_sample:

ACTUAL RESULTS
Traceback (most recent call last):
  File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 16, in <module>
    command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 61, in parse_output
    cli_table.ParseCmd(data, attrs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd
    self.table = self._ParseCmdItem(self.raw, template_file=template_files[0])
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem
    for record in fsm.ParseText(cmd_input):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText
    self._CheckLine(line)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine
    if self._Operations(rule, line):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations
    raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.'
textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 36. Input Line: HA Health Status: .
pnpestov commented 6 months ago

Sorry, I forgot to specify the firmware version Version: FortiGate-40F v7.0.13,build0566,231024 (GA.M)

mjbear commented 2 months ago

@pnpestov If I'm not mistaken the code blocks preserve trailing whitespace and what not. Plus in general the raw cli output text is just easier to read.

If you're open to building up changes and submitting a pull request (PR) that's the best way to see to it these bugfixes get merged in.

mjbear commented 2 months ago

@pnpestov Once I looked at the template I realized why the first post on this thread had the markdown it did. A code block would have been extremely helpful to prevent that. :grinning:

I used the raw output (I put it in a code block below) that was in the first post to work against (in hopes no whitespace/formatting was lost). :man_shrugging:

Ultimately for my solution: I state transitioned, captured the "warning" line using the trailing semicolon ; as a regex anchor (required or things get weird), captured Model and used that line to state transition back to Start. :sweat_smile:

:tada: I end up with the following structured output:

---
parsed_sample:
  - cluster_state_changed_time: "2024-02-11 15:25:27"
    cluster_uptime: "63 days 22:15:42"
    ha_group: "172"
    ha_health: "WARNING: FGT40FYYYYYYYYYY has mondev down"
    ha_master_unit_index: "1" 
    ha_master_unit_name: "FGT-fw-a"
    ha_master_unit_serial: "FGT40FXXXXXXXXXX"
    ha_mode: "HA A-P"
    ha_override_status: "enable"
    ha_session_pickup_delay: "disable"
    ha_session_pickup_status: "enable"
    ha_slave_unit_index: "0" 
    ha_slave_unit_name: "FGT-fw-b"
    ha_slave_unit_serial: "FGT40FYYYYYYYYYY"
    model: "FortiGate-40F"

Raw output from first post:

HA Health Status:
WARNING: FGT40FYYYYYYYYYY has mondev down;
Model: FortiGate-40F
Mode: HA A-P
Group: 172
Debug: 0
Cluster Uptime: 63 days 22:15:42
Cluster state change time: 2024-02-11 15:25:27
Primary selected using:
<2024/02/11 15:25:27> FGT40FXXXXXXXXXX is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGT40FYYYYYYYYYY.
ses_pickup: enable, ses_pickup_delay=disable
override: enable
Configuration Status:
FGT40FXXXXXXXXXX(updated 0 seconds ago): in-sync
FGT40FYYYYYYYYYY(updated 0 seconds ago): in-sync
System Usage stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
sessions=768, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=35%
FGT40FYYYYYYYYYY(updated 0 seconds ago):
sessions=634, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=31%
HBDEV stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=9997131732/27616386/0/0, tx=10080077920/27616652/0/0
lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=11772621099/36693306/0/0, tx=26151306122/60128423/0/0
FGT40FYYYYYYYYYY(updated 0 seconds ago):
lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=10080077920/27616652/0/0, tx=9997131732/27616386/0/0
lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=26151777728/60128423/0/0, tx=11771044717/36693306/0/0
MONDEV stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=535463275509/3388288017/0/0, tx=3023591767050/4114831127/0/0
wan: physical/100auto, up, rx-bytes/packets/dropped/errors=3314385262333/4439875482/0/0, tx=768352772861/3445252569/0/0
FGT40FYYYYYYYYYY(updated 0 seconds ago):
lan1: physical/00, down, rx-bytes/packets/dropped/errors=0/0/0/0, tx=0/0/0/0
wan: physical/100auto, up, rx-bytes/packets/dropped/errors=15792718293/245544650/0/0, tx=0/0/0/0
Primary : FGT-fw-a, FGT40FXXXXXXXXXX, HA cluster index = 1
Secondary : FGT-fw-b, FGT40FYYYYYYYYYY, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGT40FXXXXXXXXXX, HA operating index = 0
Secondary: FGT40FYYYYYYYYYY, HA operating index = 1
mjbear commented 2 months ago

@pnpestov Submitted PR #1791

pnpestov commented 3 weeks ago

Good time of day! Thanks for your reply! But such a situation is also possible: ftg-fw-a # get sys ha status HA Health Status: WARNING: FGTXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P ...

Error: Traceback (most recent call last): File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 127, in command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 77, in parse_output cli_table.ParseCmd(data, attrs) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd self.table = self._ParseCmdItem(self.raw, template_file=template_files[0]) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem for record in fsm.ParseText(cmd_input): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText self._CheckLine(line) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine if self._Operations(rule, line): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.' textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 37. Input Line: WARNING: FGTХХХХХХХХХХХХХ has hbdev down; .

mjbear commented 3 weeks ago

Good time of day! Thanks for your reply! But such a situation is also possible: ftg-fw-a # get sys ha status HA Health Status: WARNING: FGTXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P ...

Only two months after the PR was merged in, haha. :wink:

@pnpestov Please open up a new issue ticket to cover the bases here.

If you provide the :point_right: full raw output (albeit sanitized of any private details, ex: serial numbers) on that new issue I'll take a look at working up changes for this. Unless you want to work up a PR.

mjbear commented 3 weeks ago

@pnpestov I ran this through textfsm and there doesn't appear to be anything wrong with the template (see below) based on the snippet of raw output you provided above.

Please check that you're using a current version (or git clone) or ntc-templates. Thank you.

[
    {
        "CLUSTER_STATE_CHANGED_TIME": "",
        "CLUSTER_UPTIME": "",
        "HA_GROUP": "",
        "HA_HEALTH": "WARNING: FGTYYYYYYYYYYYYY has hbdev down",
        "HA_MASTER_UNIT_INDEX": "",
        "HA_MASTER_UNIT_NAME": "",
        "HA_MASTER_UNIT_SERIAL": "",
        "HA_MODE": "HA A-P",
        "HA_OVERRIDE_STATUS": "",
        "HA_SESSION_PICKUP_DELAY": "",
        "HA_SESSION_PICKUP_STATUS": "",
        "HA_SLAVE_UNIT_INDEX": "",
        "HA_SLAVE_UNIT_NAME": "",
        "HA_SLAVE_UNIT_SERIAL": "",
        "MODEL": "FortiGate-40F"
    }
]
pnpestov commented 3 weeks ago

@mjbear Thanks for the prompt response! I'm using the current version of ntc-templates. I noticed that the github editor removes the whitespace characters before WARNING. It turns out that you are conducting a test with an incorrect output. In fact, the output in the CLI is as follows:

HA Health Status:     WARNING: FGTXXXXXXXXXXXXX has hbdev down;     WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P

I looked at the template handler, most likely the error occurs just because of the whitespace characters.

mjbear commented 3 weeks ago

@mjbear Thanks for the prompt response! I'm using the current version of ntc-templates. I noticed that the github editor removes the whitespace characters before WARNING. It turns out that you are conducting a test with an incorrect output. In fact, the output in the CLI is as follows:

Most welcome. I can say with complete certainty the development for PR #1791 was not from the GitHub editor, but instead my local OS.

:bulb: Ah it was the output from this thread that had the white space stripped. Should have used code blocks. (Oh well, things happen, it's ok.)

HA Health Status: WARNING: FGTXXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P

I looked at the template handler, most likely the error occurs just because of the whitespace characters.

:dart: Would you mind performing the following steps: :question:

  1. Gather the full output from get system ha status
  2. Open a new issue ticket
  3. Place that (sanitized) raw output in a code block (by using the <> icon or triple backticks ```) within the new issue ticket

I'd be glad to complete this fix once and for all provided I have full output and everything requested.

pnpestov commented 3 weeks ago

Yes, of course! New issue ticket - https://github.com/networktocode/ntc-templates/issues/1859