napalm-automation / napalm-junos

Apache License 2.0
22 stars 42 forks source link

get_bgp_neighbors_detail() returning a value of list for ['messages_queued_out'] when peer ASN has multiple route table instances #138

Closed jeffgallagher closed 7 years ago

jeffgallagher commented 7 years ago

Description of Issue/Question

Did you follow the steps from https://github.com/napalm-automation/napalm#faq

[X] Yes [ ] No

Setup

napalm-junos version

(Paste verbatim output from pip freeze | grep napalm-junos between quotes below)

napalm-junos==0.6.3

JunOS version

(Paste verbatim output from show version and haiku between quotes below)

Model: mx960
JUNOS Base OS boot [12.3R9.4]
JUNOS Base OS Software Suite [12.3R9.4]
JUNOS Kernel Software Suite [12.3R9.4]
JUNOS Crypto Software Suite [12.3R9.4]
JUNOS Packet Forwarding Engine Support (M/T/EX Common) [12.3R9.4]
JUNOS Packet Forwarding Engine Support (MX Common) [12.3R9.4]
JUNOS Online Documentation [12.3R9.4]
JUNOS Services AACL Container package [12.3R9.4]
JUNOS Services Application Level Gateways [12.3R9.4]
JUNOS AppId Services [12.3R9.4]
JUNOS Border Gateway Function package [12.3R9.4]
JUNOS Services Captive Portal and Content Delivery Container package [12.3R9.4]
JUNOS Services HTTP Content Management package [12.3R9.4]
JUNOS IDP Services [12.3R9.4]
JUNOS Services LL-PDF Container package [12.3R9.4]
JUNOS Services NAT [12.3R9.4]
JUNOS Services PTSP Container package [12.3R9.4]
JUNOS Services RPM [12.3R9.4]
JUNOS Services Stateful Firewall [12.3R9.4]
JUNOS Voice Services Container package [12.3R9.4]
JUNOS Services Example Container package [12.3R9.4]
JUNOS Services Crypto [12.3R9.4]
JUNOS Services SSL [12.3R9.4]
JUNOS Services IPSec [12.3R9.4]
JUNOS Runtime Software Suite [12.3R9.4]
JUNOS platform Software Suite [12.3R9.4]
JUNOS Routing Software Suite [12.3R9.4]

        3am; darkness;
        Maintenance window closing.
        Safety net: rollback.

Steps to Reproduce the Issue

Error Traceback

(Paste the complete traceback of the exception between quotes below)

Traceback (most recent call last):
  File "HighSpeedPoller.py", line 616, in <module>
    main()
  File "HighSpeedPoller.py", line 597, in main
    get_BGPinfo(device, system_info['hostname'], MyInfluxClient, MySQLClient, "bgp_state")
  File "HighSpeedPoller.py", line 304, in get_BGPinfo
    print(int(neighborDetailInfo[vrf][asn][0]['messages_queued_out']))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

Details:

When a BGP peer ASN has a neighbor in multiple route instances (inet0, inet6 and in this example the peer also happens to be a netflow sink - inetflow.0) the value of key ['messages_queued_out'] will return a non int value. (It returns a list) The documentation suggests the value will always be an int. This appears to be less of a NAPALM bug but in fact a corner case under some circumstances - it can be viewed in the way JUNOS reports in show bgp neighbor - see below:

Code for test:

neighborDetailInfo = device.get_bgp_neighbors_detail()

for vrf in neighborDetailInfo:
        for asn in neighborDetailInfo[vrf]:
               print("-----")
                print(vrf)
                print(neighborDetailInfo[vrf][asn][0]['remote_address'])
                print(neighborDetailInfo[vrf][asn][0]['local_address'])
                print(neighborDetailInfo[vrf][asn][0]['messages_queued_out'])
                print("-----")

Note: The traceback error can be reproduced by explicitly casting the ['messages_queued_out'] to an int - I have left this off to print the following for debug purposes...

The results iterating through the peers:

-----
inet.0
207.231.227.42
207.231.227.41
0
----
.
.  ## other peers omitted for clarity
.
inetflow.0
142.166.13.12
207.231.227.2
[0, 0, 0]

-----
inet6.0
142.166.13.12
207.231.227.2
[0, 0, 0]
-----

Notice the last two peers - two different route instances but they are reporting a LIST value and not an INT. These are iBGP sessions internal to the network.

show bgp neighbor on the box reveals more details; first a peer that is correctly returning an int value: (First an external peer / with ipv4 only to remote AS)

   Peer: 207.231.227.42+57724 AS 32934 Local: 207.231.227.41 +179 AS 
  Description: 
  Type: External    State: Established    Flags: 
  Last State: EstabSync     Last Event: RecvKeepAlive
  Last Error: Cease

   ## Some detail here omitted for privacy

  Last traffic (seconds): Received 25   Sent 26   Checked 52
  Input messages:  Total 223552 Updates 10      Refreshes 0     Octets 4247976
  Output messages: Total 230840 Updates 1178    Refreshes 0     Octets 4478126
  Output Queue[0]: 0

Note the "Output Queue" value - single entry

Second - the peer (which happens to be an internal iBGP in this case) - the peer sessions establish v4 and v6 iBGP with the remote IP and this particular corner case, the remote node also happens to sink netflow.

Peer: 142.166.13.12+21246 Local: 207.231.227.2+179 
  Description: Internal Peer 
  Type: Internal    State: Established  (route reflector client)Flags: 
  Last State: EstabSync     Last Event: RecvKeepAlive
  Last Error: None

  ### some details omitted here for privacy and clarity ### 

 NLRI for restart configured on peer: inet-unicast inet6-unicast inet-flow
  NLRI advertised by peer: inet-unicast inet6-unicast inet-flow
  NLRI for this session: inet-unicast inet6-unicast inet-flow
  Peer does not support Refresh capability
  Stale routes from peer are kept for: 300
  Peer does not support Restarter functionality
  Peer does not support Receiver functionality
  Peer does not support Addpath
  Table inet.0 Bit: 10009
    RIB State: BGP restart is complete
    Send state: in sync
    Active prefixes:              0
    Received prefixes:            1
    Accepted prefixes:            1
    Suppressed due to damping:    0
    Advertised prefixes:          631937
  Table inetflow.0 Bit: 20000
    RIB State: BGP restart is complete
    Send state: in sync
    Active prefixes:              0
    Received prefixes:            0
    Accepted prefixes:            0
    Suppressed due to damping:    0
    Advertised prefixes:          0
  Table inet6.0 Bit: 30003
    RIB State: BGP restart is complete
    Send state: in sync
    Active prefixes:              0
    Received prefixes:            0
    Accepted prefixes:            0
    Suppressed due to damping:    0
    Advertised prefixes:          36762
  Last traffic (seconds): Received 2    Sent 0    Checked 2
  Input messages:  Total 40898  Updates 32      Refreshes 0     Octets 778410
  Output messages: Total 7774021        Updates 7728428 Refreshes 0     Octets 862558005
  Output Queue[0]: 0
  Output Queue[1]: 0
  Output Queue[2]: 0

Note the bottom output - three queues!! the same peer that is returning a list value for ['messages_queued_out'] (presumably each queue representing a value for each routing instance (inet0, inet6 and inetflow)

jeffgallagher commented 7 years ago

Just noticed I wasn't at current napalm-junos release - upgraded to 0.6.6 and the issue is still present.

mirceaulinic commented 7 years ago

Hi @jeffgallagher - thanks for reporting. I've been able to reproduce this, but I'm not sure what's the best way to fix. In certain circumstances we may transform the list to a certain string format. But this field has to be int.

At the same time, looking at this together with https://github.com/napalm-automation/napalm-junos/issues/139, I think we should group inet.0, inet6.0 and inetflow.0 into default and aggregate messages_queued_out as the sum.

Does this make sense?

jeffgallagher commented 7 years ago

Agreed. I think that approach makes the most sense.

mirceaulinic commented 7 years ago

I will see if I can solve it this week and release a major version, as this changes many things (but they are changing for the better, ofc).