netenglabs / suzieq

Using network observability to operate and design healthier networks
https://www.stardustsystems.net/
Apache License 2.0
789 stars 104 forks source link

[Bug]: KeyError: 'version' when docker poller runs for JUNOS #803

Closed dd1245 closed 1 year ago

dd1245 commented 1 year ago

Suzieq version

0.19.1

Install Type

container

Python version

whatever is bundled with your container

Impacted component

sq-poller

Steps to Reproduce

Deployed the docket container and attempted to query a single Juniper MX device. When the poller runs I get the below

sq-poller -I inventory.yml WORKER 0: 2022-09-22 20:29:03,858 - suzieq.poller.worker - WARNING - log level WARNING WORKER 0: 2022-09-22 20:29:04,746 - suzieq.poller.worker.nodes.node - WARNING - Detected junos-mx for 172.16.1.2:22, demo.rtr.01 WORKER 0: 2022-09-22 20:29:05,951 - asyncio - ERROR - Task exception was never retrieved WORKER 0: future: <Task finished name='Task-35' coro=<Node._exec_service() done, defined at /usr/local/lib/python3.8/site-packages/suzieq/poller/worker/nodes/node.py:837> exception=KeyError('version')> WORKER 0: Traceback (most recent call last): WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/nodes/node.py", line 895, in _exec_service WORKER 0: os_version = item['version']

I kept the inventory as basic as possible as per the below.

sources:
- name: juniper-router
  hosts:
    - url: ssh://172.16.1.2 username=user1@management password=xxxxxxxxx

namespaces:
- name: transit
  source: juniper-router

Expected Behavior

Poller to run successfully

Observed Behavior

WORKER 0: 2022-09-22 20:29:04,746 - suzieq.poller.worker.nodes.node - WARNING - Detected junos-mx for 172.16.1.2:22, demo.rtr.01 WORKER 0: 2022-09-22 20:29:05,951 - asyncio - ERROR - Task exception was never retrieved WORKER 0: future: <Task finished name='Task-35' coro=<Node._exec_service() done, defined at /usr/local/lib/python3.8/site-packages/suzieq/poller/worker/nodes/node.py:837> exception=KeyError('version')> WORKER 0: Traceback (most recent call last): WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/nodes/node.py", line 895, in _exec_service WORKER 0: os_version = item['version']

Screenshots

Additional Context

ddutt commented 1 year ago

Hi, Could you please attach the output of show system information | display json | no-more

dd1245 commented 1 year ago

Sure.

{
    "system-information" : [
    {
        "hardware-model" : [
        {
            "data" : "mx204"
        }
        ],
        "os-name" : [
        {
            "data" : "junos"
        }
        ],
        "os-version" : [
        {
            "data" : "20.4R3-S2.6"
        }
        ],
        "serial-number" : [
        {
            "data" : "FLXXX"
        }
        ],
        "host-name" : [
        {
            "data" : "demo.rtr.01"
        }
        ]
    }
    ]
}
ddutt commented 1 year ago

I see the bug. Its a silly one. If I put out a temporary container with the fix, would you be willing to try it?

dd1245 commented 1 year ago

I see the bug. Its a silly one. If I put out a temporary container with the fix, would you be willing to try it?

Yes, absolutley

ddutt commented 1 year ago

OK, working on it. Should be uploaded shortly

ddutt commented 1 year ago

Sorry, one more little thing. Would you be able to share the output of the command: show route protocol direct | display json | no-more? I just want to see the info upto the first level of the JSON data. I don't need to see the routes, if thats sensitive info. If you don't mind sharing, I'd like to see the whole thing.

dd1245 commented 1 year ago

Sure, attached showrouteprotocol.txt

ddutt commented 1 year ago

Thanks, building container now

ddutt commented 1 year ago

Can you pull the container ddutt/suzieq:0.19.2 and try if it works?

dd1245 commented 1 year ago

The version error is gone, but now you get

WORKER 0: 2022-09-23 00:12:27,484 - suzieq.poller.worker.services.service - ERROR - Processing data failed for service routes on node 172.16.1.2 WORKER 0: Traceback (most recent call last): WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 819, in run WORKER 0: result = self.process_data(output) WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 478, in process_data WORKER 0: return self.clean_data(result, data) WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 537, in clean_data WORKER 0: processed_data = dev_clean_fn(processed_data, raw_data) WORKER 0: File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/routes.py", line 118, in _clean_junos_data WORKER 0: vrf = entry.pop("vrf")[0]['data']

ddutt commented 1 year ago

Ugh, that's the one I was trying to fix and asked you for the show route output. Let me get back to you on this. I'll try and get another image tonight, if possible

ddutt commented 1 year ago

The error is happening BTW because I changed the parser for routes for MX in 0.19.0. If you don't have internet scale routing tables, you could try using 0.18.0 OR just run the current version of the poller for now with the additional option -x routes to exclude pulling routes.

dd1245 commented 1 year ago

Is there any way to exclude routes only for that namespace? To exclude route collection for MX only?

ddutt commented 1 year ago

Not without spinning up a different poller

ddutt commented 1 year ago

Can you try pulling the same container name again? The image hash is fd100700b75e.

dd1245 commented 1 year ago

That seems to have solved that issue. But, strangely two other issues are now present, which I don't believe I had before, not sure if there are other changes in this version?

  1. Interfaces on SRX device types are no longer processed, but MX and EX switches are fine.

    WORKER 0]: 2022-09-23 08:24:15,675 - suzieq.poller.worker.services.service - ERROR - Processing data failed for service interfaces on node 172.16.1.10
    [WORKER 0]: Traceback (most recent call last):
    [WORKER 0]:   File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 819, in run
    [WORKER 0]:     result = self.process_data(output)
    [WORKER 0]:   File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 478, in process_data
    [WORKER 0]:     return self.clean_data(result, data)
    [WORKER 0]:   File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/service.py", line 537, in clean_data
    [WORKER 0]:     processed_data = dev_clean_fn(processed_data, raw_data)
    [WORKER 0]:   File "/usr/local/lib/python3.8/site-packages/suzieq/poller/worker/services/interfaces.py", line 504, in _clean_junos_data
    [WORKER 0]:     plen = (elem.get("ifa-destination",
    [WORKER 0]: IndexError: list index out of range
  2. SRX devices now show up as having IP addresses for hostnames, as do MX, but EX switches still have their actual hostname.

              hostname       model      version   vendor status
0          172.16.1.1       mx204  20.4R3-S2.6  Juniper  alive
14         172.16.39.2      srx345     20.4R3.8  Juniper  alive
15  cpe1.test.dc.01 ex3400-24t     21.2R3.8  Juniper  alive
ddutt commented 1 year ago

When you say you didn't have them before, do you mean before 0.19.1? I unfortunately don't have live SRX/MX devices to test the parsers with. If you're willing to help, I can get this fixed for you. I can fix the detection of SRX easily, but I will need help to confirm the fixes, and for the parser error. Are you on the Slack? Its easier for us to communicate like that?