inex / IXP-Manager

Full stack web application powering peering at over 200 Internet Exchange Points (IXPs) globally.
https://www.ixpmanager.org/
GNU General Public License v2.0
377 stars 161 forks source link

update-l2database.pl not compatible with juniper #104

Closed pdxmaverick closed 10 years ago

pdxmaverick commented 10 years ago

cannot read BRIDGE-MIB or Q-BRIDGE-MIB from switch ex4500

I know we have discussed this, I wanted to have an open ticket so it doesn't fall in the cracks.

Issue with how SNMP reports interface name xe-0/0/0.0

nickhilliard commented 10 years ago

this is part of a much larger problem, namely how to semantically interpret the snmp/sflow output of multiple vendors and handle all of this in a way which is sensible and which won't cause further breakage down the line.

There are a variety of ways of dealing with the issue, but all of them either involve messy vendor-specific hacks or else large-scale changes to how the code interprets what it's seeing from network devices. There's no simple way of handling this because it comes down to semantics, i.e. it's not just an issue of dropping the .0 from the device name. We're working on the issue at the moment, but it is not simple to deal with.

rowanthorpe commented 10 years ago

While hacking at a quick workaround (not a solution) to this for Juniper EX switches, so far I've found almost all of my meaningful Google-search results keep sending me to the same project: SNMP::Info - OO Perl Interface to Network devices and MIBs through SNMP. Looking deeper it seems they have done almost all the heavy-lifting of what IXP-Manager is facing with this issue. At a glance I suspect it might be worth looking into using the following library (and perhaps mining the code to discern the equivalent PHP changes needed), because otherwise there might be a lot of reinventing the wheel. From what I can see they have full support for Juniper EX...

rowanthorpe commented 10 years ago

On Debian it is packaged as libsnmp-info-perl, although:

rowanthorpe commented 10 years ago

I was right about the Juniper EX compatibility. This is the proper homepage for SNMP::Info and the Device Compatibility Matrix linked there has this section for Juniper EX :-)

rowanthorpe commented 10 years ago

The correct location for downloading the latest snapshot of the Netdisco MIBs is here.

nickhilliard commented 10 years ago

even the netdisco-mibs collection isn't enough to handle juniper support. I've scrounged a couple more mibs from around teh internets and have SNMP::Info running at the moment. am currently fighting with the API to see what info it produces.

rowanthorpe commented 10 years ago

I have it (and the mibs file) installed too, and just ran:

use SNMP::Info;
my $juniper = new SNMP::Info(
    AutoSpecify => 1,
    Debug       => 1,
    DestHost    => 'xx_URL_xx',
    Community   => 'public',
    Version     => 2
) or die "Can't connect to DestHost.\n";
my $class      = $juniper->class();
print "SNMP::Info determined this device to fall under subclass : $class\n";

and got the result:

SNMP::Info::_global layers : SNMPv2-MIB::sysServices.0 : .1.3.6.1.2.1.1.7.0
SNMP::Info::_global description : SNMPv2-MIB::sysDescr.0 : .1.3.6.1.2.1.1.1.0
SNMP::Info::_global id : SNMPv2-MIB::sysObjectID.0 : .1.3.6.1.2.1.1.2.0
SNMP::Info 3.08
SNMP::Info::device_type() layers:00000110 id:2636 sysDescr:"Juniper Networks, Inc. ex4500-40f Ethernet Switch, kernel JUNOS 12.3R4.6, Build date: 2013-09-13 04:11:02 UTC Copyright (c) 1996-2013 Juniper Networks, Inc."
SNMP::Info::specify() - Changed Class to SNMP::Info::Layer3::Juniper.
SNMP::Info determined this device to fall under subclass : SNMP::Info::Layer3::Juniper

...looks promising.

(using libsnmp-info-perl v3.08-1, using the latest-snapshot of the netdisco-mibs)

nickhilliard commented 10 years ago

I'm a step ahead of you but don't have an ex switch to test out on. currently in lab:

# perl ~nick/foo | grep em0
em0: 00:0c:29:49:eb:37 
em0.0: 00:0c:29:49:eb:37 

this doesn't quite solve the problem because it doesn't handle the semantic difference between em0 and em0.0. What we really need is a canonical reference to point the mac address to a single physical interface. This will may involve a walk up the juniIfInvStackTable mib to get it right.

I'm currently trying to figure out how snmp::info handles this, or whether it depends on les hacques.

rowanthorpe commented 10 years ago

Ah, ok. Let me know if there is anything I can real-world test for you here, as I have daily access to an ex (but only for the next half-hour today...).

rowanthorpe commented 10 years ago

...and this page about the Juniper module is proving informative.

rowanthorpe commented 10 years ago

Sorry if I am throwing in red herrings here, but the page I just mentioned has this:

$juniper->mac()
  Returns the MAC address used by this bridge when it must be referred to in a unique fashion.
  (dot1dBaseBridgeAddress)

and these:

$juniper->i_trunk()
  (jnxExVlanPortAccessMode)

$juniper->i_vlan()
  Returns a mapping between ifIndex and the PVID or default VLAN.
nickhilliard commented 10 years ago

i'm trawling through the source code for this right now to see how it determines that there is a link between em0 and em0.0 in the FDB

nickhilliard commented 10 years ago

Can you run this code and post the output somewhere if it returns anything?

 my $interfaces = $juniper->interfaces();
 my $fw_mac     = $juniper->fw_mac();
 my $fw_port    = $juniper->fw_port();
 my $bp_index   = $juniper->bp_index();

 foreach my $fw_index (keys %$fw_mac){
    my $mac   = $fw_mac->{$fw_index};
    my $bp_id = $fw_port->{$fw_index};
    my $iid   = $bp_index->{$bp_id};
    my $port  = $interfaces->{$iid};

    print "Port: $port | $mac | $bp_id | $iid\n";
 } 
rowanthorpe commented 10 years ago

just emailed it to you

nickhilliard commented 10 years ago

ok that output means that SNMP::Info has the same issue as the ixp-m code. the index is returned to point to the virtual interface rather than the physical interface. I think this need to accept that the ixp-m code needs to make a semantic decision doing things slightly differently with J boxes.

rowanthorpe commented 10 years ago

One of the router admins here told me the .0 is the standard/only Juniper syntax for the access ports, and (I am probably quoting this slightly wrong because I don't understand the concepts so well) he said that as far as he understood my question, the right thing is to always address the physical port with the ".0" (it is some leftover from Juniper's router-oriented origins) and that they always require a "dot-something", so dot-zero is unavoidable... Possibly I am not saying anything new, or worse yet, am talking utter nonsense - but I'll throw that out in case it helps.

nickhilliard commented 10 years ago

.0 is a convention, not a rule.

nickhilliard commented 10 years ago

can you check out commit fa302a30bc to see if this fixes the issue? I'm slightly split on this patch. Originally I didn't want to approach the problem from this angle, but the alternative is that the code performs an interface stack trawl to figure out the parent interface of the logical interface that's associated with the mac address. I think in all cases we're guaranteed to end up with an interface with the .\d+ stripped off the end. Thing is the UI code just uses the physical parent interface for everything, so from a semantic point of view maybe this isn't the worst thing in the world.

rowanthorpe commented 10 years ago

Thanks for putting together that patch. Without doing various VPN/jump-server gymnastics it will be tricky to get access to the switch tonight. I will check it first thing tomorrow morning. I have a suspicion though that this patch will still face the problem that when no explicit vlan=? is specified it will try normal BRIDGE-MIB stuff, which will fail if I'm not mistaken, rather than what the unfinished code I sent you tries to do (i.e. for juniper it always uses juniper's version of qbridge with a non-specified vlan replaced with vlan=0). In all other respects though it looks like it will get much further than my code :-D Anyway, I will look in the morning.

rowanthorpe commented 10 years ago

Unfortunately my suspicion in the previous comment was right (at least for the switch we have here). I think it only requires a tiny extra tweak to work though, so I'll see if I can manage it. If so I will send a PR back to that dev-branch (which I know is not the normal way, but seems to be more sensible in this case).

rowanthorpe commented 10 years ago

I think I've managed to get it working for both cases, i.e.:

@nickhilliard I will email you output of your commit (run with --vlan 0 and with no --vlan) and the additions I made to get it to generate output for both cases, with that output too)...

rowanthorpe commented 10 years ago

@nickhilliard Have sent a PR against your dev-branch with the mentioned extra tweak, and emailed you the output mentioned in the previous comment. It seems to work...

nickhilliard commented 10 years ago

rowan, can you confirm what you're trying to do here? from my reading of your patch, the issue is that juniper uses pseudo vlan==0 when the port is untagged, but that update-l2database.pl doesn't grok this because it defaults to BRIDGE-MIB if $vlan ==0 or is undefined. Is this all you're trying to do, or is there anything else funny which I've missed. I.e. the normal semantics for untagged is vlan=1, so what on earth does it mean when .1.3.6.1.4.1.2636.3.40.1.5.1.5.1.5 returns an entry which reads "Gauge32: 1"?

rowanthorpe commented 10 years ago

My understanding of the concepts is flaky at best, and I am mostly working off running snmpwalk loads of times, and deducing the pattern from the behaviour - hence why I really need you to check that what I do makes sense, rather than just happens to match a pattern by luck. The short answer to your question is: yes, but with a twist in the syntax's tail. The long answer is:

As for what I found. I'll start with how I understand your code works with the Juniper(s) (please stop me if I'm wrong):

Now, what I found is that with jnxExVlanTag the switch exposes all the vlan-tags and their vlanid mappings, and also a mapping from "pseudo tag 0" to some other vlanid. With the latest code-changes you made it seems the retrieved vlanids all work successfully when used in the qbridge request except for the vlanid which was returned for "pseudo tag 0". qbridge borks on that vlanid, saying it doesn't exist, even though it was returned by jnxExVlanTag as a valid mapping. What I found by experimenting though was that the kind of results I expected to get for that request could be returned by querying the Q-BRIDGE-MIB without any vlanid appended at all i.e. snmpwalk(.1.3.6.1.2.1.17.7.1.2.2.1.2). As for your question about what a query for --vlan 1 should yield on these switches - I don't have a clue. I only know that our switch just tells me it doesn't exist. I just asked one of our router team who said to his knowledge the semantics for untagged on Juniper is vlan=0. He also now did an experiment and tried to set unit 1 as an access port. The switch returned an error message saying "only unit 0 can be used as an access port".

I realise we are knee-deep in "ugly hack" territory now, and I hope that this motley ensemble of hacks can be dressed up to look like a "fix" for the Junipers, rather than just another blip on the radar of my naive optimism...

nickhilliard commented 10 years ago

i can't decide which rage meme applies best: facepalm, FFFUUUU, you've got to be kidding me, etc. Tell you one thing though, I'm glad I started out this process by drawing a state diagram because these vendors are taking the piss in a major way. And this is only for three vendors, not including cisco. What were they smoking?

pdxmaverick commented 10 years ago

Nick, would it be of any value to get JTAC involved? If we can get the Sflow stuff to ignore my Cisco switch I could test your new fix on our switch.

Thanks, Brian

On Mon, Dec 16, 2013 at 11:41 AM, Nick Hilliard notifications@github.comwrote:

i can't decide which rage meme applies best: facepalm, FFFUUUU, you've got to be kidding me, etc. Tell you one thing though, I'm glad I started out this process by drawing a state diagram because these vendors are taking the piss in a major way. And this is only for three vendors, not including cisco. What were they smoking?

— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-30692874 .

nickhilliard commented 10 years ago

just found some documentation in juniper KB articles KB26533 (non default vlan) and KB20833 (default vlan). It looks to me like the code is correct for both situations.

@rowanthorpe, can you log into switch-new.gr-ix.gr and run through KB20833 to see if it produces anything on the command-line. If it does, could you email me the output. If it doesn't, could you restart snmp, then try again?

FWIW, i'm getting results for .1.3.6.1.2.1.17.4.3.1.2 from a Juniper EX switch that I have access to, and it seems to work ok.

Incidentally, I've just noticed another buglet which is fixed in commit 8cbbbf2.

pdxmaverick commented 10 years ago

Nick,

I just checked out 8cbbbf2, still seeing the same results.

root@portal:/usr/local/ixp# /usr/local/bin/update-l2database.pl --debug --vlan 998 DEBUG: processing NWAX-Inband cannot read dot1dBasePortIfIndex from NWAX-Inband at /usr/local/bin/ update-l2database.pl line 166.

Please let me know what I can do to help.

Thanks, Brian

On Mon, Dec 16, 2013 at 2:34 PM, Nick Hilliard notifications@github.comwrote:

just found some documentation in juniper KB articles KB26533http://kb.juniper.net/InfoCenter/index?page=content&id=KB26533(non default vlan) and KB20833 http://kb.juniper.net/InfoCenter/index?page=content&id=KB20833(default vlan). It looks to me like the code is correct for both situations.

@rowanthorpe https://github.com/rowanthorpe, can you log into switch-new.gr-ix.gr and run through KB20833 to see if it produces anything on the command-line. If it does, could you email me the output. If it doesn't, could you restart snmp, then try again?

FWIW, i'm getting results for .1.3.6.1.2.1.17.4.3.1.2 from a Juniper EX switch that I have access to, and it seems to work ok.

Incidentally, I've just noticed another buglet which is fixed in commit 8cbbbf2 https://github.com/inex/IXP-Manager/commit/8cbbbf2.

— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-30708247 .

rowanthorpe commented 10 years ago

@nickhilliard - I don't have login access to our switch, but have forwarded your request to someone who does. Will send you their response when I get it. Based on how this process is going it strikes me that this is fast becoming the kind of coding that should ideally be in an external, reusable library (so that the code doesn't balloon too much within IXP-M itself). I know that you already have OSS_SNMP for the php code. Should there perhaps be an equivalent perl lib...? (I realise you have the intention to migrate as much as possible to php anyway though...).

rowanthorpe commented 10 years ago

@nickhilliard - I just tried your commit (8cbbbf2) and it fails for me too. I know you are hoping to find a way for it to work with BRIDGE_MIB, but sadly it doesn't here (and obviously for @pdxmaverick too). The thing missing which is in my Pull Request (#116 - 96e61a6845) and which makes my version seem to work for me is at line 214 of my version where it calls the Q-BRIDGE-MIB without a vlanid appended. I am still waiting for someone here to get back to me with the "logged in query" results you asked for.

nickhilliard commented 10 years ago

whoa, hang on here, we're now talking about 3 separate problems :-)

8cbbbf2 fixes a problem with junipers so that when we get the vlan=0 issue sorted out, it will return the physical interface (i.e. xe-0/0/0) instead of the logical interface (i.e. xe-0/0/0.0).

@rowanthorpe, I need to look at the debugging output from your switches to see what to expect for the case of juniper / default vlan. Agreed that this code needs to be libified. I'll do that at some stage, but want to get it working first.

@pdxmaverick, your problem is related to this code not supporting certain types of IOS, including e.g. C4948 and C6500 but not e.g. C3550/C3560/C3750. I've opened up a separate issue for this: #117.

rowanthorpe commented 10 years ago

Well, on the plus-side - this is well on its way to becoming The. Most. Epic. Github-comment-thread. Evar. Let's try for >100 comments.

nickhilliard commented 10 years ago

i could have it sorted in 20 minutes if I had snmp and CLI read/write access to an EX switch, sigh.

rowanthorpe commented 10 years ago

@nickhilliard : Did you get Andreas' email a few days ago? Just checking, in case it got caught in a spam filter or something...

pdxmaverick commented 10 years ago

@nickhilliard I have removed my Cisco 4948 from IXP, your latest version 8cbbbf2 does run, but still not matching interfaces. I have posted the output https://gist.github.com/pdxmaverick/8126358

Please send me your IP address and I will get you SNMP access to our switch.

Cheers, Merry Christmas, Brian

pdxmaverick commented 10 years ago

@nickhilliard I was stepping through Juniper http://kb.juniper.net/InfoCenter/index?page=content&id=KB26533# from @rowanthorpe comment above. Here is how it looks in my switch. Following this logic, I think it would be safe to drop the .0 as you have already selected your context of vlan 998 or (NWAX-A), I can't think of any scenario where you would ever find a mac address that did not link back to a default logical interface of .0

Show all mac address on vlan bthompson@NWAX1-EX> show ethernet-switching table vlan 998 Ethernet-switching table: 50 unicast entries VLAN MAC address Type Age Interfaces NWAX-A * Flood - All-members NWAX-A 00:01:63:8e:5c:00 Learn 0 ge-0/0/22.0 NWAX-A 00:03:32:af:4c:19 Learn 0 ge-0/0/28.0 NWAX-A 00:0b:45:0a:48:00 Learn 0 ge-0/0/8.0 NWAX-A 00:0c:29:16:17:c3 Learn 34 ge-0/2/2.0 NWAX-A 00:0c:29:62:c8:67 Learn 33 ge-0/2/2.0 NWAX-A 00:0d:66:ed:ca:66 Learn 0 ge-0/0/12.0 NWAX-A 00:12:1e:c4:10:db Learn 0 ge-0/0/21.0 NWAX-A 00:12:43:64:04:19 Learn 0 ge-0/0/13.0 NWAX-A 00:12:f2:f4:a3:00 Learn 0 ge-0/0/31.0 NWAX-A 00:14:f6:8d:30:1f Learn 0 ge-0/0/5.0 NWAX-A 00:14:f6:f2:2c:00 Learn 0 ge-0/0/32.0 NWAX-A 00:16:9c:6c:7d:00 Learn 0 ge-0/0/27.0 NWAX-A 00:17:cb:a4:15:fc Learn 0 ge-0/0/10.0 NWAX-A 00:19:07:aa:9c:80 Learn 0 ge-0/1/3.0 NWAX-A 00:1a:a2:ec:88:40 Learn 0 xe-0/1/0.0 NWAX-A 00:1b:21:16:b1:30 Learn 0 ge-0/2/0.0 NWAX-A 00:1b:2a:f0:fc:00 Learn 0 ge-0/2/2.0 NWAX-A 00:1b:ed:b1:ce:00 Learn 0 ge-0/0/7.0 NWAX-A 00:1b:ed:e5:c9:60 Learn 0 xe-0/0/16.0 NWAX-A 00:1c:0f:5c:98:40 Learn 0 ge-0/0/29.0 NWAX-A 00:1c:57:d2:b8:84 Learn 0 ge-0/0/31.0 NWAX-A 00:1d:b5:a0:8f:f0 Learn 0 xe-0/0/0.0 NWAX-A 00:1d:e5:aa:bc:19 Learn 0 ge-0/0/38.0 NWAX-A 00:1e:13:e4:f4:40 Learn 0 ge-0/0/20.0 NWAX-A 00:1f:12:da:fb:f0 Learn 0 ae23.0 NWAX-A 00:25:64:2a:cb:16 Learn 0 ge-0/0/13.0 NWAX-A 00:25:90:35:48:f0 Learn 0 ge-0/2/2.0 NWAX-A 00:27:0c:ed:fb:81 Learn 0 ge-0/0/21.0 NWAX-A 00:27:0d:fd:b6:00 Learn 0 xe-0/0/35.0 NWAX-A 00:50:0b:38:b4:19 Learn 0 ge-0/0/19.0 NWAX-A 00:d0:2b:19:41:00 Learn 0 ge-0/0/39.0 NWAX-A 10:8c:cf:56:93:40 Learn 0 ge-0/2/3.0 NWAX-A 10:f3:11:51:62:e5 Learn 0 xe-0/0/15.0 NWAX-A 30:f7:0d:93:ba:b1 Learn 0 ge-0/2/1.0 NWAX-A 40:55:39:1c:e9:bb Learn 0 xe-0/0/26.0 NWAX-A 5c:5e:ab:36:33:0f Learn 0 xe-0/0/4.0 NWAX-A 5c:5e:ab:d1:d8:65 Learn 0 ge-0/0/6.0 NWAX-A 5c:5e:ab:d2:42:78 Learn 0 ge-0/1/1.0 NWAX-A 5c:5e:ab:d6:d8:78 Learn 0 ge-0/0/17.0 NWAX-A 5c:5e:ab:dc:7e:79 Learn 0 ge-0/0/2.0 NWAX-A 6c:9c:ed:29:cc:cd Learn 0 ge-0/0/18.0 NWAX-A 78:fe:3d:0f:70:a4 Learn 0 ge-0/0/11.0 NWAX-A 7c:20:64:e6:ec:cb Learn 0 ge-0/0/14.0 NWAX-A 88:e0:f3:28:1e:01 Learn 0 ge-0/0/3.0 NWAX-A 88:e0:f3:7a:c4:64 Learn 0 ge-0/0/30.0 NWAX-A 88:e0:f3:7d:79:c1 Learn 0 ge-0/0/34.0 NWAX-A ac:4b:c8:41:37:cd Learn 0 ae1.0 NWAX-A c4:64:13:c9:03:20 Learn 0 ge-0/0/25.0 NWAX-A c4:64:13:ce:8d:30 Learn 0 xe-0/1/2.0 NWAX-A f8:c0:01:d8:94:88 Learn 0 ge-0/0/36.0

{master:0}

Selecting 00:0c:29:16:17:c3 as an example, as I know it is a peer on a my cisco 4948 and is learned from a port that is a 802.1q trunk. So it would have learned it on a 998 tagged packet.

bthompson@NWAX1-EX> show configuration interfaces ge-0/2/2 description "Connection to Cisco Management Switch"; unit 0 { family ethernet-switching { port-mode trunk; vlan { members [ NWAX-INBAND NWAX-A ]; } } }

bthompson@NWAX1-EX> show ethernet-switching table vlan 998 | match c3 NWAX-A 00:0c:29:16:17:c3 Learn 0 ge-0/2/2.0

{master:0}

  1. Use dot1qVlanStaticName to obtain the VLAN internal index for NWAX-A. The following output indicates that the value is 2:

bthompson@NWAX1-EX> ... snmp mib walk dot1qVlanStaticName | match NWAX-A dot1qVlanStaticName.2 = NWAX-A

{master:0}

  1. Use dot1qTpFdbPort to obtain all of the entries of the right VLAN (add .2 for the required VLAN)

bthompson@NWAX1-EX> show snmp mib walk dot1qTpFdbPort.2 dot1qTpFdbPort.2.0.1.99.142.92.0 = 535 dot1qTpFdbPort.2.0.3.50.175.76.25 = 541 dot1qTpFdbPort.2.0.11.69.10.72.0 = 521 dot1qTpFdbPort.2.0.12.41.22.23.195 = 567 dot1qTpFdbPort.2.0.12.41.98.200.103 = 567 dot1qTpFdbPort.2.0.12.133.209.66.16 = 544 dot1qTpFdbPort.2.0.13.102.237.202.102 = 525 dot1qTpFdbPort.2.0.18.30.196.16.219 = 534 dot1qTpFdbPort.2.0.18.67.100.4.25 = 526 dot1qTpFdbPort.2.0.18.242.244.163.0 = 544 dot1qTpFdbPort.2.0.20.246.141.48.31 = 518 dot1qTpFdbPort.2.0.20.246.242.44.0 = 545 dot1qTpFdbPort.2.0.22.156.108.125.0 = 540 dot1qTpFdbPort.2.0.23.203.164.21.252 = 523 dot1qTpFdbPort.2.0.25.7.170.156.128 = 564 dot1qTpFdbPort.2.0.26.162.236.136.64 = 561 dot1qTpFdbPort.2.0.27.33.22.177.48 = 565 dot1qTpFdbPort.2.0.27.42.240.252.0 = 567 dot1qTpFdbPort.2.0.27.237.177.206.0 = 520 dot1qTpFdbPort.2.0.27.237.229.201.96 = 529 dot1qTpFdbPort.2.0.28.15.92.152.64 = 542 dot1qTpFdbPort.2.0.28.87.210.184.132 = 544 dot1qTpFdbPort.2.0.29.181.160.143.240 = 513 dot1qTpFdbPort.2.0.29.229.170.188.25 = 551 dot1qTpFdbPort.2.0.30.19.228.244.64 = 533 dot1qTpFdbPort.2.0.31.18.218.251.240 = 24 dot1qTpFdbPort.2.0.37.100.42.203.22 = 526 dot1qTpFdbPort.2.0.37.144.53.72.240 = 567 dot1qTpFdbPort.2.0.39.12.237.251.129 = 534 dot1qTpFdbPort.2.0.39.13.253.182.0 = 548 dot1qTpFdbPort.2.0.80.11.56.180.25 = 532 dot1qTpFdbPort.2.0.208.43.25.65.0 = 552 dot1qTpFdbPort.2.16.140.207.86.147.64 = 568 dot1qTpFdbPort.2.16.243.17.81.98.229 = 528 dot1qTpFdbPort.2.48.247.13.147.186.177 = 566 dot1qTpFdbPort.2.64.85.57.28.233.187 = 539 dot1qTpFdbPort.2.92.94.171.54.51.15 = 517 dot1qTpFdbPort.2.92.94.171.209.216.101 = 519 dot1qTpFdbPort.2.92.94.171.210.66.120 = 562 dot1qTpFdbPort.2.92.94.171.214.216.120 = 530 dot1qTpFdbPort.2.92.94.171.220.126.121 = 515 dot1qTpFdbPort.2.108.156.237.41.204.205 = 531 dot1qTpFdbPort.2.120.254.61.15.112.164 = 524 dot1qTpFdbPort.2.124.32.100.230.236.203 = 527 dot1qTpFdbPort.2.136.224.243.40.30.1 = 516 dot1qTpFdbPort.2.136.224.243.122.196.100 = 543 dot1qTpFdbPort.2.136.224.243.125.121.193 = 547 dot1qTpFdbPort.2.172.75.200.65.55.205 = 2 dot1qTpFdbPort.2.196.100.19.201.3.32 = 538 dot1qTpFdbPort.2.196.100.19.206.141.48 = 563 dot1qTpFdbPort.2.248.192.1.216.148.136 = 549

{master:0}

  1. Obtain the MAC address and port interface Index from the value that was obtained in step 2:

dot1qTpFdbPort.2.0.12.41.22.23.195 = 567

  1. Convert c3 (hex) to 195(decimal)

bthompson@NWAX1-EX> show snmp mib walk dot1qTpFdbPort.2 | match 195 dot1qTpFdbPort.2.0.12.41.22.23.195 = 567

{master:0}

  1. Obtain the interface SNMP IfIndex from dot1dTpFdbPort:

bthompson@NWAX1-EX> show snmp mib walk dot1dBasePortIfIndex | match 567 dot1dBasePortIfIndex.567 = 641

{master:0}

  1. Obtain the interface name from the SNMP IfIndex:

bthompson@NWAX1-EX> show snmp mib get ifName.641 ifName.641 = ge-0/2/2.0

{master:0}

pdxmaverick commented 10 years ago

@rowanthorpe What did you do with your Juniper. Can you share any hack that might get it to work for now?

rowanthorpe commented 10 years ago

@pdxmaverick I just refactored and rebased my pull-request from three weeks ago, to fit around @nickhilliard's latest changes, so you can see that updated in #116 (NB: that PR is against one of INEX's non-public, experimental, "do not track me" branches, so treat it as such). Check my latest comment there to see what exactly that latest version is (the normal diff is unreadable because of indentation changes, I included a diff -b in the comment).

WARNING: I am not entirely sure that fix is "correct". I found it by trial-and-error and it superficially "seems" to give sane results for our Juniper EX4500, but I would feel uncomfortable about anyone (including me) relying on it until it is confirmed against the relevant Juniper specs to be a correct "fix" or not by someone more SNMP-savvy (...yourself?). As far as I understand SNMP, an snmpwalk shouldn't actually "change" anything though, so I guess the worst scenario is receiving subtly wrong data in the meantime (i.e. don't trust what you see entirely unless you verify the data yourself).

PS: I just noticed @barryo has switched on Travis support but that branch obviously doesn't have a .travis.yml file in place (as it existed before he switched Travis support on), so it is defaulting to trying to test Tags, Pull Requests, etc in "default mode" (i.e. as Ruby..!) so ignore those Failure: Travis CI build failed symbols for now...

barryo commented 10 years ago

I just noticed @barryo has switched on Travis support

Or rather I am in the very early process of starting to set up Travis...

pdxmaverick commented 10 years ago

Poking, as our IX is ready to release IXP to members although still no sflow support.

rowanthorpe commented 10 years ago

This is in answer to the question asked of me at #116. I am answering it here though as it will probably help the flow of dialogue better, and might benefit anyone reading, by way of the context of the preceding comment thread.

@pdxmaverick I am happy to try to help (and we do use vlan/s here too), but that comes with a big caveat: I am not as up-to-speed on "routing" theory as the rest of you evidently are. I come from the devel side of things, and am trying to catch up with what subset of routing concepts I need to know for ixp-m... Anyway, I will try to cite each thing I claim to "understand" - even if it means linking to some painfully rudimentary principles - so you (and/or @nickhilliard) can fact-check me as I go.

Firstly, the best explanation I have found which sums up the ".0 mystery" is this explanation of the so-called "unit 0". As someone who hasn't acclimatised to pre-existing "best practises", I have to say it sounds like this approach has a certain elegance and consistency to it. It's just annoying that it seems so incompatible with the others...

From reading the following links - A, B, C, and D, including some useful comments on link D - I think I can summarise as follows (please correct me if I've misunderstood): On an EX each port seems to always have at least one "vlan id" (or "unit"), and can either be in access (untagged) mode or trunk (tagged) mode. In access mode it receives/transmits untagged ethernet frames, has the single "vlan id 0" but is treated as having "no vlan id" (although specific, limited "multi vlan" behaviour can be achieved using things like lldp-med, though). Conversely, in trunk mode, multi vlans are possible, all received/transmitted frames must be tagged ".1q" frames, except for potentially one of the vlan ids which can be marked as the "native vlan" id, distinct from the others.

Apparently "trunk" ports are only for inter-switch/router communication, and we are contacting the switch from a server looking only for the "access ports" in order to find their (level 2) macs. If I have assumed that correctly, then we should expect that the "raw untagged" port we are interested in will by definition always include "unit 0" (".0") tagged on the end, for sending and receiving. This means to be compatible with existing code this script should always add a ".0" to the port it reports to the switch, and always chop ".0" off the end of what it receives as the reported port. The Pull Request I already have open does the chopping of ".0"s for finding and then pushing values into the macaddress db-table in a form similar to that of the other brands (note that the "$ports" dumped debug data has no trailing ".0"s). After running the script you could visually check the resulting ixp-m database table against your own list of expected macs/ports by doing:

USE [ixpm-db-name];
SELECT * FROM macaddress;

I guess then for presentation purposes the strictly correct thing would be for anything drawing data from the macaddress table to also re-add the ".0"s (and more importantly, any php code which might - in future? - need to rely on the correct port-to-mac mappings for actually computing further data would have to re-add the ".0"s, not just for cosmetic reasons... maybe @barryo will know more about that).

So as far as I can see (and as deeply as I can research without drowning in a sea of unfamiliar jargon), it seems my fix does what is needed to get an l2-ports-to-macs mapping from an EX into the database in a form which fits sanely with the present ixp-m datamodel. As for whether the ixp-m datamodel itself needs tweaking in order to adequately accomodate Juniper's model is an ixp-m design-decision, which is not my domain ;-)

I hope that helps, and I hope I haven't made any wildly wrong assumptions along the way...

nickhilliard commented 10 years ago

@rowanthorpe the .0 convention is a convention not a rule. It could in theory be any number and I haven't found the correct OIDs to figure out which is the correct unit number. Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.

I've taken your and @pdxmaverick's suggestions and merged this in with a bunch of other things into the code. Can you check out https://github.com/inex/IXP-Manager/blob/0848952b8f041188e4379f5f85c75fe8a80fb8bc/tools/runtime/l2database/update-l2database.pl and see if this fixes this problem?

pdxmaverick commented 10 years ago

The snmpwalk is already vlan specific. If the port is in access mode, it would be 0 or the vlan you already specified. @nick said he could fix this in 20 mins. I have provided access. On Jan 26, 2014 10:55 AM, "Nick Hilliard" notifications@github.com wrote:

@rowanthorpe https://github.com/rowanthorpe the .0 convention is a convention not a rule. It could in theory be any number and I haven't found the correct OIDs to figure out which is the correct unit number. Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.

I've taken your and @pdxmaverick https://github.com/pdxmaverick's suggestions and merged this in with a bunch of other things into the code. Can you check out https://github.com/inex/IXP-Manager/blob/0848952b8f041188e4379f5f85c75fe8a80fb8bc/tools/runtime/l2database/update-l2database.pland see if this fixes this problem?

— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-33326359 .

rowanthorpe commented 10 years ago

@nickhilliard I will test the new summarised code when I can get switch-access (and time!) in the coming day(..s) and will let you know.

<side_note>

I realise - and realised the previous time you mentioned - that it is a convention and not a rule, and just observed that on reflection it seems not a totally insane convention, granted that it allows one to parse SNMP in a truly consistent hierarchical manner, rather than requiring WET logic-flows like this:

if $is_physical_intf; then
  thismethod
else
  if $is_logical_intf; then
    if $is_native_intf; then
      othermethod
    else
      yetanothermethod
    fi
  fi
fi

...albeit at the cost of compatibility with other brands' conventions - which somewhat defeats the point, though. ;-)

Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.

In my long-winded and probably-using-all-the-wrong-terms-and-conflating-concepts kind of way, that is what I was saying too... I am glad I got that right in my own head - even if I failed at communicating as much.

</side_note>

EDIT: Just for reference, in case it ever impacts on future issues too, I just remembered what I told you in a previous thread a while ago - that one of our router admins tested it and found that the EXs don't even allow any unit other than "unit 0" to be the access port.

rowanthorpe commented 10 years ago

I was just now able to briefly talk with a router admin here who clarified/confirmed some things for me. I will summarise below what was discussed, even if much of it is stating what is already known:

Therefore this confirms my previous guess - which is that the Right Way™ to syntactically interoperate between JunOS's naming convention and IXP-M/everyone-else's naming convention when talking SNMP to switch ports is to always trim ".0" from the received port-names and to always append ".0" to the transmitted port names...

PS: Apparently this naming convention is common to all Junipers, not just the EXs...

pdxmaverick commented 10 years ago

@barryo I have tested your latest code. Not sure why yet but it is not working like my pull request to @rowanthorpe fork.

I am very confused on who's fork to pull from and have spent many hours trying to move from branch to branch. Ultimately failing. I have to resort to cut and paste.

Still no joy https://gist.github.com/pdxmaverick/8701695 is my output.

pdxmaverick commented 10 years ago

Argh!!!! retract, my own confusion causing false negatives.

Yes the code is working. Could you merge it in to master so I can GIT back on track :)

barryo commented 10 years ago

@nickhilliard - this is really your call as I wasn't following here. @pdxmaverick is asking that @rowanthorpe's pull request #116 be merged. It cannot be merged automatically (probably conflicts). If you're happy for it to be merged I can do the leg work on merging it. I also reopened #116.

rowanthorpe commented 10 years ago

@barryo If I'm not mistaken I think @pdxmaverick said that in the end @nickhilliard's latest commit worked for him too...

Argh!!!! retract... [snip] Yes the code is working... [snip]

Nick's latest commit incorporates elements from both my and Brian's Pull Requests - so is probably the better code to stick with). Is that correct? I haven't yet had a chance to test the code myself. If Brian confirms that Nick's code works then I think Nick's intention is to close both Pull Requests, and go with what he's already committed.

BTW: in the interests of confirming/comparing sane functionality and possibly for future reference with any issues php might have with this stuff, a colleague here pointed out that Observium seems to handle it all correctly. If you go to look at their code though, just know that they use a modified QPL license - OSI approved but I haven't read it closely regarding GPL-compatibility...

nickhilliard commented 10 years ago

lol@observium. No, not going there, even to take a look. @rowanthorpe, if you can test this code out, I'll merge it back into master. I think it should work, but don't have enough glue in place to test it out on a live system.