netdisco / snmp-info

Other
38 stars 32 forks source link

No links between Dell switches #340

Open mileswu123 opened 5 years ago

mileswu123 commented 5 years ago

When I tested Netdisco with Dell switches, Netdisco only shows neighbors when the switch's IP address is in lldp_rman_addr {...}

Expected Behavior

There is link between switch 162.16.0.10/port Gi1/0/30 and switch 162.17.0.206/port Gi1/0/12. I expect Netdisco can derive the link from the LLDP data.

Current Behavior

Netdisco only shows neighbors when the switch's IP is in lldp_rman_addr {...} When I looked into SNMP/Info.pm and SNMP/Info/LLDP.pm, I couldn't see how it handle this kind of LLDP data.

Possible Solution

I used ./netdisco-do show -d 162.16.0.10 -e all to collect the following info (I removed unrelated data)

stp_i_mac
{ 0 "wõä" }

e_name
{ 45 "Gi1/0/30", }


lldp_rem_id
{ 42849478.42.7183 "‰„Ñπ", }

lldp_rem_id_type
{ 42849478.42.7183 "macAddress", }

lldp_rem_media_cap
{ 42849478.42.7183 "\0", }

lldp_rem_pid
{ 42849478.42.7183 "Gi1/0/12", }

lldp_rem_pid_type
{ 42849478.42.7183 "interfaceName", }

lldp_rem_sys_cap
{ 42849478.42.7183 "\0\0", }

lldp_rem_sysname
{ 42849478.42.7183 “2000_Switch", }

lldp_rem_sysdesc
{ 42849478.42.7183 "", }

And I used ./netdisco-do show -d 162.17.0.206 -e all to collect the following info (I removed unrelated data)

stp_i_mac
{ 0 "‰„Ñπ" }

e_name
{ 4 "Gi1/0/12", }


lldp_rem_cap_spt
{ 21768813.1.2291 "\0\0",}

lldp_rem_desc
{ 21768813.1.2291 “Rm2000", }

lldp_rem_id
{ 21768813.1.2291 "wõä", }

lldp_rem_id_type
{ 21768813.1.2291 "macAddress", }

lldp_rem_media_cap
{ 21768813.1.2291 "\0", }

lldp_rem_pid
{ 21768813.1.2291 "Gi1/0/30", }

lldp_rem_pid_type
{ 21768813.1.2291 "interfaceName", }

lldp_rem_sys_cap
{ 21768813.1.2291 "\0\0", }

lldp_rem_sysdesc
{ 21768813.1.2291 “2ndFloor_Switch", }

lldp_rem_sysname
{ 21768813.1.2291 “", }

By using the LLDP data with macAddress type as the id. Netdisco can search the id from the stp_i_mac of all the switches to locate the neighbored switch and also linked ports between the two switches.

Steps to Reproduce (for bugs)

Context

Your Environment

inphobia commented 5 years ago

could you provide us with the output of:

netdisco-do show -d 162.16.0.10 -e description -DI
netdisco-do show -d 162.16.0.10 -e has_topo
netdisco-do show -d 162.16.0.10 -e lldp_id
netdisco-do show -d 162.16.0.10 -e lldp_ip
netdisco-do show -d 162.16.0.10 -e lldp_ipv6
netdisco-do show -d 162.16.0.10 -e lldp_mac
netdisco-do show -d 162.16.0.10 -e lldp_addr
netdisco-do show -d 162.16.0.10 -e lldp_rman_addr

and if possible the same for 162.17.0.206?

thx

side note, the code you're looking for is contained here: https://github.com/netdisco/netdisco/blob/master/lib/App/Netdisco/Worker/Plugin/Discover/Neighbors.pm but at first glance non-ip based detection should be working.

mileswu123 commented 5 years ago

Here are the log files.

Thanks, Miles

On May 6, 2019, at 4:38 PM, nick n. notifications@github.com wrote:

could you provide us with the output of:

netdisco-do show -d 162.16.0.10 -e description -DI netdisco-do show -d 162.16.0.10 -e has_topo netdisco-do show -d 162.16.0.10 -e lldp_id netdisco-do show -d 162.16.0.10 -e lldp_ip netdisco-do show -d 162.16.0.10 -e lldp_ipv6 netdisco-do show -d 162.16.0.10 -e lldp_mac netdisco-do show -d 162.16.0.10 -e lldp_addr netdisco-do show -d 162.16.0.10 -e lldp_rman_addr and if possible the same for 162.17.0.206?

thx

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/netdisco/netdisco/issues/575#issuecomment-489825466, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQKDMZAIZX5K4XGW6YXEWLPUC6PXANCNFSM4HLECOCQ.

inphobia commented 5 years ago

hm, seems i can't find them in this issue...

mileswu123 commented 5 years ago

Do you need more info or log?

inphobia commented 5 years ago

i would like the output of the following commands, if at all possible:

netdisco-do show -d 162.16.0.10 -e description -DI
netdisco-do show -d 162.16.0.10 -e has_topo
netdisco-do show -d 162.16.0.10 -e lldp_id
netdisco-do show -d 162.16.0.10 -e lldp_ip
netdisco-do show -d 162.16.0.10 -e lldp_ipv6
netdisco-do show -d 162.16.0.10 -e lldp_mac
netdisco-do show -d 162.16.0.10 -e lldp_addr
netdisco-do show -d 162.16.0.10 -e lldp_rman_addr

and if you could to the same for 162.17.0.206 that would be even better.

mileswu123 commented 5 years ago

162_16_0_10.log 162_16_0_206.log

Here are the files!

mileswu123 commented 5 years ago

Do you need more log files? I have 3 other ports with the same issue.

inphobia commented 5 years ago

i've been looking into this but can't seem to find an obvious reason why it wouldn't work. if possible i would like the output of netdisco-do show -d XipX -e all -DI for both devices, but that will most likely contain sensitive data. if you mail the output to me or create a private github repo i won't share the data with anyone else.

inphobia commented 5 years ago

oh, 1 more thing you can try. i see you're using snmp::info 3.65. perhaps you can try upgrading to 3.67 or later first. we fixed some issues with handling of unnamed interfaces and and other crap some devices send, which at times made some interfaces not report neighbors.

mileswu123 commented 5 years ago

I just replied to Nick email. Please let me know if you didn't receive it.

inphobia commented 5 years ago

at first glance i haven't gotten anything yet, nor did i see anything blocked in the spamfilter. it's: nick.nauwelaerts @ aquafin.be

mileswu123 commented 5 years ago

Did you receive it? I sent email to nick.nauwelaerts@aquafin.be on 5/28.

inphobia commented 5 years ago

got it, but still figuring out the issue.

while dumping mac addresses in lldp_ip might work for you i'm not fond of that solution. from my current understanding netdisco will fail to save relations if it does not know the ip (ipv4 or 6) of the remote device:

https://github.com/netdisco/netdisco/blob/0f1c264e410c1ee4de7c44a8f7550356c09225dd/lib/App/Netdisco/Worker/Plugin/Discover/Neighbors.pm#L130-L140

@ollyg : do we support topo stuff when the remote device has no easily determined ip address?

mileswu123 commented 5 years ago

Looks like Dell switch returns stp_i_mac in lldp_rem_id query, if lldp_rem_id_type is mac address. I think that the following changes could make this work.

  1. Query and store stp_i_mac into DB device table during the device discovery.
  2. Modify the function sub lldp_ip {} in LLDP.pm
    sub lldp_ip { my $lldp = shift; my $partial = shift;

    my $rman_addr = $lldp->lldp_rman_addr($partial) || {};

    my %lldp_ip; foreach my $key ( keys %$rman_addr ) { my ( $index, $proto, $addr ) = $lldp->_lldp_addr_index($key); next unless defined $index; next unless $proto == 1; $lldp_ip{$index} = $addr;
    }

    Look up switch IP address by stp_i_mac

    my $ch_type = $lldp->lldp_rem_id_type($partial) || {}; my $ch = $lldp->lldp_rem_id($partial) || {};

    foreach my $key ( keys %$ch ) { my $id = $ch->{$key}; next unless $id; my $type = $ch_type->{$key}; next unless !defined $lldp_ip{$key};

    if ( $type =~ /mac/ ) {
        # someone needs to implement function lookup_ip_by_stp_i_mac()
        my $ip = SNMP::Info::lookup_ip_by_stp_i_mac($id)
        if ($ip)
             $lldp_ip{$key} = $id;
    }

    } return \%lldp_ip; }

inphobia commented 5 years ago

tagging @JeroenvIS after i saw him discussing this on irc

inphobia commented 5 years ago
1. Query and store stp_i_mac into DB device table during the device discovery.

sounds like a good idea, the problem is this will not always be unique. (at least in cisco nexus vpc setups this will be shared between vtp members). on the other hand, we alrdy have a mac field in the device table, which comes from the mac() function, but not all snmp::info classes support this, nor is it the guaranteed to be the same as stp_i_mac. (stp_i_mac is not advertised in the docs, the b_mac from s::i::bridge should take care of that.)

that all being said, i am a fan of adding the stp base adress to device info, either as the column "mac" in the device table, or as a new column.

after reviewing your logs & the code again, all the lldp mappings are done with lldpRemManAddrIfSubtype (the lldp_rman_addr function in s::i::lldap) or with lldpLocPortDesc.

now, the first switch only returns 1 entry here:

[4969] 2019-05-07 00:20:45  info show: [162.16.0.206]/lldp_rman_addr started at Tue May  7 00:20:45 2019
\ {
    15671549.52.1464.1.4.162.16.4.189   "ifIndex"
}

that's most likely the reason why it's missing so many neighbors.

i went through all the release notes for n1100 & n1500 switches, and noticed 2 bugs:

image image

i'm no fan of having lldp_ip() return anything other as an ip address, that's not it's function. lldp_addr() should return an address if it's known for all devices, but as seen in your logs it only returns 1 peer. since your switches are running different os versions i wonder if that might could be part of the issue (wrt those bugs).

ofcourse, none of that fixes your issue :)

perhaps an approach would be to store lldpLocChassisId or anything else from the lldpLoc tree for each device and try to match against that as a last resort.

mileswu123 commented 5 years ago

For experiment, I got it working by reusing the mac field in Device table, modifying Layer2.pm and Layer3.pm to retrieve the stp_i_map and store them into DB. Then I modified lldp_ip() in LLDP.pm to map the lldp_rem_id to ip, when lldp_rem_id_type is Mac address. Please see the attached file.

After rediscovering all devices, which will get stp_i_mac for all layer2/3 devices and find the neighbor through lldp_ip{}, I got all three missing linkages.

WHERE me.ip = '172.17.0.10' AND me.port = 'Gi1/0/16' [10577] 2019-06-10 23:13:57 debug [172.17.0.10] neigh - 172.17.1.207 with ID [e4:f0:04:e3:84:b9] on Gi1/0/42

WHERE me.ip = '172.17.0.10' AND me.port = 'Gi2/0/20' [10577] 2019-06-10 23:13:57 debug [172.17.0.10] neigh - 172.17.1.206 with ID [4c:d9:8f:e0:d6:f5] on Gi1/0/31

WHERE me.ip = '172.17.0.10' AND me.port = 'Gi1/0/2' [10577] 2019-06-10 23:13:57 debug [172.17.0.10] neigh - 172.17.0.1 with ID [20:04:0f:18:60:f2] on Te1/0/3

These Dell switches are running with 6.2.6.6, 6.4.1.4 and 6.5.2.23 firmware.

Archive.zip

inphobia commented 5 years ago

i haven't forgotten about this but have been wondering what the best approach would be to tac;kle this problem.

i'm thinking this could be a multistep approach:

  1. netdisco uses the s::i::mac() function to get the input it uses to fill out the mac address field in the device info tab. this function in only available on s::i::layer3.pm and several device specific modules. netdisco should try harder to get a useable address here, but i'm not sure which is the better way:
    • move mac() from layer3 to info.pm so it becomes a basic function
      • side note; most, but not all, network devices use mac addresses, it's mostly for ethernet & etherlike stuff. other protocols have different means for linklocal addressing.
    • perhaps netdisco needs some kind of system id field instead of a mac field.
    • or perhaps we need to start saving multiple device ids (mac, stp root, lldp id, etc...)
    • have a mac() function in layer2 like the change you proposed. i don't however agree with the layer3.pm change. dot1dBaseBridgeAddress only belongs in layer2 imo. either netdisco or snmp::info main should prefer one over the other.
  2. it seems netdisco currently might has issues c_*() functions returning multiple matching entries for the same device, my oppinion is that snmp::info does the correct thing by returning all permutations it knows about, and then let netdisco device in which priority to try & resolve those relations.
  3. finally, it seems those delll devices might return s::i::interfaces() results index on user defined names for those interfaces. while this is perfectly legal from an snmp standpoint this might confuse netdisco if those user assigned names change (i have a supsiscion this is the case but have yet to dive deepe into it. so:
  4. your lldp.pm changes are a working approach to solving the last piece of the puzzle, but snmp::info is made to query devices. while it's almost exclusivly used by netdisco we can't use netdisco functions in snmp::info regretfully. i'm still of the opinion snmp::info is doing it's job in this instance and we need a bit more logic in netdisco to get this working.

now, most of these changes might have a bigger impact as expected, i'll open issues for the individual steps to resolve and to request de feedback on what's the preferred way to handle this.

bottom line: it's on our radar but will take time to get everything into place to make thiss work.

tagging @netdisco/snmp-info-developers & @netdisco/snmp-info-admins here for more feedback on best approach.

inphobia commented 5 years ago

https://github.com/netdisco/snmp-info/issues/342#issuecomment-502444168 has a testscript to collect all known relations. is there any chance you could run that on both devices?

that would give me a better idea on how to refactor netdisco, since imo snmp::info is doing this correctly, but netdisco insists on having a remote ip (for which there are valid reasons, since looking for ips is a lot quicker as checking each know port mac), but perhaps there are ways around this.

thx

edo1 commented 4 years ago

Any news? IMO this is an important thing, some devices announce MAC address only.

inphobia commented 4 years ago

after an hiatus i see #395 also has the same issues. the reason this issue is still open is because it's still a problem, regretfully also one without an easy fix. we have not forgotten this, nor do we have a solid solution right now. it's taking longer that any of us wanted, but we're working on it.

inphobia commented 4 years ago

netdisco/netdisco#737 also feels very similar.

JeroenvIS commented 4 years ago

Any news? IMO this is an important thing, some devices announce MAC address only.

IMHO it depends on how devices advertise themselves.

First of all, there is the Chassis ID that is advertised: "The string value used to identify the chassis component associated with the remote system." The device will also say what type of ID it is sending, the lldpRemChassisIdSubtype. There are 7 defined types: "{chassisComponent(1), interfaceAlias(2), portComponent(3), macAddress(4), networkAddress(5), interfaceName(6), local(7)}". The chassis ID and type are always present in LLDP neighbor info.

Then there is the remote management address(es) that can be advertised. A device can advertise multiple management addresses of different types. "The string value used to identify the management address component associated with the remote system. The purpose of this address is to contact the management entity.". The remote management address info is optional to send; what I see in our network is that generally the network components all advertise at least one management address, as do some IP phones, but other IP phones and some servers (eg ESX) and other clients just run LLDP and don't advertise any management info.

Currently, to determine neighbour relations, Netdisco only looks at management addresses that are advertised. IMHO that's a good choice, because we only store relations for devices that are known to Netdisco, and Netdisco needs a management address to connect to anyway.

Personally I'd first try to investigate why some devices in the management domain do not advertise a management address to their neighbours, but only send their chassis identifier.

If it's really not possible to configure these devices to send a management address in their LLDP messages, then we could consider extending Netdisco to also look at Chassis ID to determine potential neighbour relations. Of course that logic should then be able to deal with the different types of chassis identifiers that may be reported, or at least not break if the remote ID is not a MAC address but one of the other 6 types.

JeroenvIS commented 4 years ago

Result of a quick search on the Internet: looks like various Dell models don't send remote management info through LLDP by default, but can be configured to do so. I strongly suggest Dell users to investigate that angle before we try extensive changes to Netdisco and SNMP::Info, which could have unexpected side effects and easily introduce new bugs in determining neighbour relations.

edo1 commented 4 years ago

@JeroenvIS Any reason to do not use MAC addresses to determine neighbor relations?

Of course that logic should then be able to deal with the different types of chassis identifiers that may be reported, or at least not break if the remote ID is not a MAC address but one of the other 6 types.

Ignoring all types except a MAC address doesn't seem difficult.

inphobia commented 4 years ago

@JeroenvIS Any reason to do not use MAC addresses to determine neighbor relations?

yeah, code complexity foremost. if all we had was lldp perhaps we could add some cludges to use different identification models. to give you an idea, this is the lldp mib: http://www.circitor.fr/Mibs/Mib/L/LLDP-MIB.mib oh, i forgot to mention lldp-ext-med-mib, lldp-ext-dot1-mib & lldp-ext-dot3-mib

and then we also have cdp, edp, fdp & whatever

all updates to either our code or documentation (https://metacpan.org/pod/SNMP::Info::LLDP) are welcome, but perhaps @JeroenvIS could link the required config lines to make dell switches send their management info so we can add that to our wiki?

imo, discovery fixup is the second most common code in snmp::info after just plain broken snmp "support".