Closed pdxmaverick closed 10 years ago
this is part of a much larger problem, namely how to semantically interpret the snmp/sflow output of multiple vendors and handle all of this in a way which is sensible and which won't cause further breakage down the line.
There are a variety of ways of dealing with the issue, but all of them either involve messy vendor-specific hacks or else large-scale changes to how the code interprets what it's seeing from network devices. There's no simple way of handling this because it comes down to semantics, i.e. it's not just an issue of dropping the .0 from the device name. We're working on the issue at the moment, but it is not simple to deal with.
While hacking at a quick workaround (not a solution) to this for Juniper EX switches, so far I've found almost all of my meaningful Google-search results keep sending me to the same project: SNMP::Info - OO Perl Interface to Network devices and MIBs through SNMP. Looking deeper it seems they have done almost all the heavy-lifting of what IXP-Manager is facing with this issue. At a glance I suspect it might be worth looking into using the following library (and perhaps mining the code to discern the equivalent PHP changes needed), because otherwise there might be a lot of reinventing the wheel. From what I can see they have full support for Juniper EX...
On Debian it is packaged as libsnmp-info-perl
, although:
I was right about the Juniper EX compatibility. This is the proper homepage for SNMP::Info and the Device Compatibility Matrix linked there has this section for Juniper EX :-)
The correct location for downloading the latest snapshot of the Netdisco MIBs is here.
even the netdisco-mibs collection isn't enough to handle juniper support. I've scrounged a couple more mibs from around teh internets and have SNMP::Info running at the moment. am currently fighting with the API to see what info it produces.
I have it (and the mibs file) installed too, and just ran:
use SNMP::Info;
my $juniper = new SNMP::Info(
AutoSpecify => 1,
Debug => 1,
DestHost => 'xx_URL_xx',
Community => 'public',
Version => 2
) or die "Can't connect to DestHost.\n";
my $class = $juniper->class();
print "SNMP::Info determined this device to fall under subclass : $class\n";
and got the result:
SNMP::Info::_global layers : SNMPv2-MIB::sysServices.0 : .1.3.6.1.2.1.1.7.0
SNMP::Info::_global description : SNMPv2-MIB::sysDescr.0 : .1.3.6.1.2.1.1.1.0
SNMP::Info::_global id : SNMPv2-MIB::sysObjectID.0 : .1.3.6.1.2.1.1.2.0
SNMP::Info 3.08
SNMP::Info::device_type() layers:00000110 id:2636 sysDescr:"Juniper Networks, Inc. ex4500-40f Ethernet Switch, kernel JUNOS 12.3R4.6, Build date: 2013-09-13 04:11:02 UTC Copyright (c) 1996-2013 Juniper Networks, Inc."
SNMP::Info::specify() - Changed Class to SNMP::Info::Layer3::Juniper.
SNMP::Info determined this device to fall under subclass : SNMP::Info::Layer3::Juniper
...looks promising.
(using libsnmp-info-perl v3.08-1, using the latest-snapshot of the netdisco-mibs)
I'm a step ahead of you but don't have an ex switch to test out on. currently in lab:
# perl ~nick/foo | grep em0
em0: 00:0c:29:49:eb:37
em0.0: 00:0c:29:49:eb:37
this doesn't quite solve the problem because it doesn't handle the semantic difference between em0 and em0.0. What we really need is a canonical reference to point the mac address to a single physical interface. This will may involve a walk up the juniIfInvStackTable mib to get it right.
I'm currently trying to figure out how snmp::info handles this, or whether it depends on les hacques.
Ah, ok. Let me know if there is anything I can real-world test for you here, as I have daily access to an ex (but only for the next half-hour today...).
...and this page about the Juniper module is proving informative.
Sorry if I am throwing in red herrings here, but the page I just mentioned has this:
$juniper->mac()
Returns the MAC address used by this bridge when it must be referred to in a unique fashion.
(dot1dBaseBridgeAddress)
and these:
$juniper->i_trunk()
(jnxExVlanPortAccessMode)
$juniper->i_vlan()
Returns a mapping between ifIndex and the PVID or default VLAN.
i'm trawling through the source code for this right now to see how it determines that there is a link between em0 and em0.0 in the FDB
Can you run this code and post the output somewhere if it returns anything?
my $interfaces = $juniper->interfaces();
my $fw_mac = $juniper->fw_mac();
my $fw_port = $juniper->fw_port();
my $bp_index = $juniper->bp_index();
foreach my $fw_index (keys %$fw_mac){
my $mac = $fw_mac->{$fw_index};
my $bp_id = $fw_port->{$fw_index};
my $iid = $bp_index->{$bp_id};
my $port = $interfaces->{$iid};
print "Port: $port | $mac | $bp_id | $iid\n";
}
just emailed it to you
ok that output means that SNMP::Info has the same issue as the ixp-m code. the index is returned to point to the virtual interface rather than the physical interface. I think this need to accept that the ixp-m code needs to make a semantic decision doing things slightly differently with J boxes.
One of the router admins here told me the .0 is the standard/only Juniper syntax for the access ports, and (I am probably quoting this slightly wrong because I don't understand the concepts so well) he said that as far as he understood my question, the right thing is to always address the physical port with the ".0" (it is some leftover from Juniper's router-oriented origins) and that they always require a "dot-something", so dot-zero is unavoidable... Possibly I am not saying anything new, or worse yet, am talking utter nonsense - but I'll throw that out in case it helps.
.0 is a convention, not a rule.
can you check out commit fa302a30bc to see if this fixes the issue? I'm slightly split on this patch. Originally I didn't want to approach the problem from this angle, but the alternative is that the code performs an interface stack trawl to figure out the parent interface of the logical interface that's associated with the mac address. I think in all cases we're guaranteed to end up with an interface with the .\d+ stripped off the end. Thing is the UI code just uses the physical parent interface for everything, so from a semantic point of view maybe this isn't the worst thing in the world.
Thanks for putting together that patch. Without doing various VPN/jump-server gymnastics it will be tricky to get access to the switch tonight. I will check it first thing tomorrow morning. I have a suspicion though that this patch will still face the problem that when no explicit vlan=?
is specified it will try normal BRIDGE-MIB stuff, which will fail if I'm not mistaken, rather than what the unfinished code I sent you tries to do (i.e. for juniper it always uses juniper's version of qbridge with a non-specified vlan replaced with vlan=0). In all other respects though it looks like it will get much further than my code :-D Anyway, I will look in the morning.
Unfortunately my suspicion in the previous comment was right (at least for the switch we have here). I think it only requires a tiny extra tweak to work though, so I'll see if I can manage it. If so I will send a PR back to that dev-branch (which I know is not the normal way, but seems to be more sensible in this case).
I think I've managed to get it working for both cases, i.e.:
--vlan 0
specified--vlan
specified@nickhilliard I will email you output of your commit (run with --vlan 0
and with no --vlan
) and the additions I made to get it to generate output for both cases, with that output too)...
@nickhilliard Have sent a PR against your dev-branch with the mentioned extra tweak, and emailed you the output mentioned in the previous comment. It seems to work...
rowan, can you confirm what you're trying to do here? from my reading of your patch, the issue is that juniper uses pseudo vlan==0 when the port is untagged, but that update-l2database.pl doesn't grok this because it defaults to BRIDGE-MIB if $vlan ==0 or is undefined. Is this all you're trying to do, or is there anything else funny which I've missed. I.e. the normal semantics for untagged is vlan=1, so what on earth does it mean when .1.3.6.1.4.1.2636.3.40.1.5.1.5.1.5 returns an entry which reads "Gauge32: 1"?
My understanding of the concepts is flaky at best, and I am mostly working off running snmpwalk loads of times, and deducing the pattern from the behaviour - hence why I really need you to check that what I do makes sense, rather than just happens to match a pattern by luck. The short answer to your question is: yes, but with a twist in the syntax's tail. The long answer is:
As for what I found. I'll start with how I understand your code works with the Juniper(s) (please stop me if I'm wrong):
Now, what I found is that with jnxExVlanTag the switch exposes all the vlan-tags and their vlanid mappings, and also a mapping from "pseudo tag 0" to some other vlanid. With the latest code-changes you made it seems the retrieved vlanids all work successfully when used in the qbridge request except for the vlanid which was returned for "pseudo tag 0". qbridge borks on that vlanid, saying it doesn't exist, even though it was returned by jnxExVlanTag as a valid mapping. What I found by experimenting though was that the kind of results I expected to get for that request could be returned by querying the Q-BRIDGE-MIB without any vlanid appended at all i.e. snmpwalk(.1.3.6.1.2.1.17.7.1.2.2.1.2). As for your question about what a query for --vlan 1
should yield on these switches - I don't have a clue. I only know that our switch just tells me it doesn't exist. I just asked one of our router team who said to his knowledge the semantics for untagged on Juniper is vlan=0. He also now did an experiment and tried to set unit 1 as an access port. The switch returned an error message saying "only unit 0 can be used as an access port".
I realise we are knee-deep in "ugly hack" territory now, and I hope that this motley ensemble of hacks can be dressed up to look like a "fix" for the Junipers, rather than just another blip on the radar of my naive optimism...
i can't decide which rage meme applies best: facepalm, FFFUUUU, you've got to be kidding me, etc. Tell you one thing though, I'm glad I started out this process by drawing a state diagram because these vendors are taking the piss in a major way. And this is only for three vendors, not including cisco. What were they smoking?
Nick, would it be of any value to get JTAC involved? If we can get the Sflow stuff to ignore my Cisco switch I could test your new fix on our switch.
Thanks, Brian
On Mon, Dec 16, 2013 at 11:41 AM, Nick Hilliard notifications@github.comwrote:
i can't decide which rage meme applies best: facepalm, FFFUUUU, you've got to be kidding me, etc. Tell you one thing though, I'm glad I started out this process by drawing a state diagram because these vendors are taking the piss in a major way. And this is only for three vendors, not including cisco. What were they smoking?
— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-30692874 .
just found some documentation in juniper KB articles KB26533 (non default vlan) and KB20833 (default vlan). It looks to me like the code is correct for both situations.
@rowanthorpe, can you log into switch-new.gr-ix.gr and run through KB20833 to see if it produces anything on the command-line. If it does, could you email me the output. If it doesn't, could you restart snmp, then try again?
FWIW, i'm getting results for .1.3.6.1.2.1.17.4.3.1.2 from a Juniper EX switch that I have access to, and it seems to work ok.
Incidentally, I've just noticed another buglet which is fixed in commit 8cbbbf2.
Nick,
I just checked out 8cbbbf2, still seeing the same results.
root@portal:/usr/local/ixp# /usr/local/bin/update-l2database.pl --debug --vlan 998 DEBUG: processing NWAX-Inband cannot read dot1dBasePortIfIndex from NWAX-Inband at /usr/local/bin/ update-l2database.pl line 166.
Please let me know what I can do to help.
Thanks, Brian
On Mon, Dec 16, 2013 at 2:34 PM, Nick Hilliard notifications@github.comwrote:
just found some documentation in juniper KB articles KB26533http://kb.juniper.net/InfoCenter/index?page=content&id=KB26533(non default vlan) and KB20833 http://kb.juniper.net/InfoCenter/index?page=content&id=KB20833(default vlan). It looks to me like the code is correct for both situations.
@rowanthorpe https://github.com/rowanthorpe, can you log into switch-new.gr-ix.gr and run through KB20833 to see if it produces anything on the command-line. If it does, could you email me the output. If it doesn't, could you restart snmp, then try again?
FWIW, i'm getting results for .1.3.6.1.2.1.17.4.3.1.2 from a Juniper EX switch that I have access to, and it seems to work ok.
Incidentally, I've just noticed another buglet which is fixed in commit 8cbbbf2 https://github.com/inex/IXP-Manager/commit/8cbbbf2.
— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-30708247 .
@nickhilliard - I don't have login access to our switch, but have forwarded your request to someone who does. Will send you their response when I get it. Based on how this process is going it strikes me that this is fast becoming the kind of coding that should ideally be in an external, reusable library (so that the code doesn't balloon too much within IXP-M itself). I know that you already have OSS_SNMP for the php code. Should there perhaps be an equivalent perl lib...? (I realise you have the intention to migrate as much as possible to php anyway though...).
@nickhilliard - I just tried your commit (8cbbbf2) and it fails for me too. I know you are hoping to find a way for it to work with BRIDGE_MIB, but sadly it doesn't here (and obviously for @pdxmaverick too). The thing missing which is in my Pull Request (#116 - 96e61a6845) and which makes my version seem to work for me is at line 214 of my version where it calls the Q-BRIDGE-MIB without a vlanid appended. I am still waiting for someone here to get back to me with the "logged in query" results you asked for.
whoa, hang on here, we're now talking about 3 separate problems :-)
8cbbbf2 fixes a problem with junipers so that when we get the vlan=0 issue sorted out, it will return the physical interface (i.e. xe-0/0/0) instead of the logical interface (i.e. xe-0/0/0.0).
@rowanthorpe, I need to look at the debugging output from your switches to see what to expect for the case of juniper / default vlan. Agreed that this code needs to be libified. I'll do that at some stage, but want to get it working first.
@pdxmaverick, your problem is related to this code not supporting certain types of IOS, including e.g. C4948 and C6500 but not e.g. C3550/C3560/C3750. I've opened up a separate issue for this: #117.
Well, on the plus-side - this is well on its way to becoming The. Most. Epic. Github-comment-thread. Evar. Let's try for >100 comments.
i could have it sorted in 20 minutes if I had snmp and CLI read/write access to an EX switch, sigh.
@nickhilliard : Did you get Andreas' email a few days ago? Just checking, in case it got caught in a spam filter or something...
@nickhilliard I have removed my Cisco 4948 from IXP, your latest version 8cbbbf2 does run, but still not matching interfaces. I have posted the output https://gist.github.com/pdxmaverick/8126358
Please send me your IP address and I will get you SNMP access to our switch.
Cheers, Merry Christmas, Brian
@nickhilliard I was stepping through Juniper http://kb.juniper.net/InfoCenter/index?page=content&id=KB26533# from @rowanthorpe comment above. Here is how it looks in my switch. Following this logic, I think it would be safe to drop the .0 as you have already selected your context of vlan 998 or (NWAX-A), I can't think of any scenario where you would ever find a mac address that did not link back to a default logical interface of .0
Show all mac address on vlan bthompson@NWAX1-EX> show ethernet-switching table vlan 998 Ethernet-switching table: 50 unicast entries VLAN MAC address Type Age Interfaces NWAX-A * Flood - All-members NWAX-A 00:01:63:8e:5c:00 Learn 0 ge-0/0/22.0 NWAX-A 00:03:32:af:4c:19 Learn 0 ge-0/0/28.0 NWAX-A 00:0b:45:0a:48:00 Learn 0 ge-0/0/8.0 NWAX-A 00:0c:29:16:17:c3 Learn 34 ge-0/2/2.0 NWAX-A 00:0c:29:62:c8:67 Learn 33 ge-0/2/2.0 NWAX-A 00:0d:66:ed:ca:66 Learn 0 ge-0/0/12.0 NWAX-A 00:12:1e:c4:10:db Learn 0 ge-0/0/21.0 NWAX-A 00:12:43:64:04:19 Learn 0 ge-0/0/13.0 NWAX-A 00:12:f2:f4:a3:00 Learn 0 ge-0/0/31.0 NWAX-A 00:14:f6:8d:30:1f Learn 0 ge-0/0/5.0 NWAX-A 00:14:f6:f2:2c:00 Learn 0 ge-0/0/32.0 NWAX-A 00:16:9c:6c:7d:00 Learn 0 ge-0/0/27.0 NWAX-A 00:17:cb:a4:15:fc Learn 0 ge-0/0/10.0 NWAX-A 00:19:07:aa:9c:80 Learn 0 ge-0/1/3.0 NWAX-A 00:1a:a2:ec:88:40 Learn 0 xe-0/1/0.0 NWAX-A 00:1b:21:16:b1:30 Learn 0 ge-0/2/0.0 NWAX-A 00:1b:2a:f0:fc:00 Learn 0 ge-0/2/2.0 NWAX-A 00:1b:ed:b1:ce:00 Learn 0 ge-0/0/7.0 NWAX-A 00:1b:ed:e5:c9:60 Learn 0 xe-0/0/16.0 NWAX-A 00:1c:0f:5c:98:40 Learn 0 ge-0/0/29.0 NWAX-A 00:1c:57:d2:b8:84 Learn 0 ge-0/0/31.0 NWAX-A 00:1d:b5:a0:8f:f0 Learn 0 xe-0/0/0.0 NWAX-A 00:1d:e5:aa:bc:19 Learn 0 ge-0/0/38.0 NWAX-A 00:1e:13:e4:f4:40 Learn 0 ge-0/0/20.0 NWAX-A 00:1f:12:da:fb:f0 Learn 0 ae23.0 NWAX-A 00:25:64:2a:cb:16 Learn 0 ge-0/0/13.0 NWAX-A 00:25:90:35:48:f0 Learn 0 ge-0/2/2.0 NWAX-A 00:27:0c:ed:fb:81 Learn 0 ge-0/0/21.0 NWAX-A 00:27:0d:fd:b6:00 Learn 0 xe-0/0/35.0 NWAX-A 00:50:0b:38:b4:19 Learn 0 ge-0/0/19.0 NWAX-A 00:d0:2b:19:41:00 Learn 0 ge-0/0/39.0 NWAX-A 10:8c:cf:56:93:40 Learn 0 ge-0/2/3.0 NWAX-A 10:f3:11:51:62:e5 Learn 0 xe-0/0/15.0 NWAX-A 30:f7:0d:93:ba:b1 Learn 0 ge-0/2/1.0 NWAX-A 40:55:39:1c:e9:bb Learn 0 xe-0/0/26.0 NWAX-A 5c:5e:ab:36:33:0f Learn 0 xe-0/0/4.0 NWAX-A 5c:5e:ab:d1:d8:65 Learn 0 ge-0/0/6.0 NWAX-A 5c:5e:ab:d2:42:78 Learn 0 ge-0/1/1.0 NWAX-A 5c:5e:ab:d6:d8:78 Learn 0 ge-0/0/17.0 NWAX-A 5c:5e:ab:dc:7e:79 Learn 0 ge-0/0/2.0 NWAX-A 6c:9c:ed:29:cc:cd Learn 0 ge-0/0/18.0 NWAX-A 78:fe:3d:0f:70:a4 Learn 0 ge-0/0/11.0 NWAX-A 7c:20:64:e6:ec:cb Learn 0 ge-0/0/14.0 NWAX-A 88:e0:f3:28:1e:01 Learn 0 ge-0/0/3.0 NWAX-A 88:e0:f3:7a:c4:64 Learn 0 ge-0/0/30.0 NWAX-A 88:e0:f3:7d:79:c1 Learn 0 ge-0/0/34.0 NWAX-A ac:4b:c8:41:37:cd Learn 0 ae1.0 NWAX-A c4:64:13:c9:03:20 Learn 0 ge-0/0/25.0 NWAX-A c4:64:13:ce:8d:30 Learn 0 xe-0/1/2.0 NWAX-A f8:c0:01:d8:94:88 Learn 0 ge-0/0/36.0
{master:0}
Selecting 00:0c:29:16:17:c3 as an example, as I know it is a peer on a my cisco 4948 and is learned from a port that is a 802.1q trunk. So it would have learned it on a 998 tagged packet.
bthompson@NWAX1-EX> show configuration interfaces ge-0/2/2 description "Connection to Cisco Management Switch"; unit 0 { family ethernet-switching { port-mode trunk; vlan { members [ NWAX-INBAND NWAX-A ]; } } }
bthompson@NWAX1-EX> show ethernet-switching table vlan 998 | match c3 NWAX-A 00:0c:29:16:17:c3 Learn 0 ge-0/2/2.0
{master:0}
bthompson@NWAX1-EX> ... snmp mib walk dot1qVlanStaticName | match NWAX-A dot1qVlanStaticName.2 = NWAX-A
{master:0}
bthompson@NWAX1-EX> show snmp mib walk dot1qTpFdbPort.2 dot1qTpFdbPort.2.0.1.99.142.92.0 = 535 dot1qTpFdbPort.2.0.3.50.175.76.25 = 541 dot1qTpFdbPort.2.0.11.69.10.72.0 = 521 dot1qTpFdbPort.2.0.12.41.22.23.195 = 567 dot1qTpFdbPort.2.0.12.41.98.200.103 = 567 dot1qTpFdbPort.2.0.12.133.209.66.16 = 544 dot1qTpFdbPort.2.0.13.102.237.202.102 = 525 dot1qTpFdbPort.2.0.18.30.196.16.219 = 534 dot1qTpFdbPort.2.0.18.67.100.4.25 = 526 dot1qTpFdbPort.2.0.18.242.244.163.0 = 544 dot1qTpFdbPort.2.0.20.246.141.48.31 = 518 dot1qTpFdbPort.2.0.20.246.242.44.0 = 545 dot1qTpFdbPort.2.0.22.156.108.125.0 = 540 dot1qTpFdbPort.2.0.23.203.164.21.252 = 523 dot1qTpFdbPort.2.0.25.7.170.156.128 = 564 dot1qTpFdbPort.2.0.26.162.236.136.64 = 561 dot1qTpFdbPort.2.0.27.33.22.177.48 = 565 dot1qTpFdbPort.2.0.27.42.240.252.0 = 567 dot1qTpFdbPort.2.0.27.237.177.206.0 = 520 dot1qTpFdbPort.2.0.27.237.229.201.96 = 529 dot1qTpFdbPort.2.0.28.15.92.152.64 = 542 dot1qTpFdbPort.2.0.28.87.210.184.132 = 544 dot1qTpFdbPort.2.0.29.181.160.143.240 = 513 dot1qTpFdbPort.2.0.29.229.170.188.25 = 551 dot1qTpFdbPort.2.0.30.19.228.244.64 = 533 dot1qTpFdbPort.2.0.31.18.218.251.240 = 24 dot1qTpFdbPort.2.0.37.100.42.203.22 = 526 dot1qTpFdbPort.2.0.37.144.53.72.240 = 567 dot1qTpFdbPort.2.0.39.12.237.251.129 = 534 dot1qTpFdbPort.2.0.39.13.253.182.0 = 548 dot1qTpFdbPort.2.0.80.11.56.180.25 = 532 dot1qTpFdbPort.2.0.208.43.25.65.0 = 552 dot1qTpFdbPort.2.16.140.207.86.147.64 = 568 dot1qTpFdbPort.2.16.243.17.81.98.229 = 528 dot1qTpFdbPort.2.48.247.13.147.186.177 = 566 dot1qTpFdbPort.2.64.85.57.28.233.187 = 539 dot1qTpFdbPort.2.92.94.171.54.51.15 = 517 dot1qTpFdbPort.2.92.94.171.209.216.101 = 519 dot1qTpFdbPort.2.92.94.171.210.66.120 = 562 dot1qTpFdbPort.2.92.94.171.214.216.120 = 530 dot1qTpFdbPort.2.92.94.171.220.126.121 = 515 dot1qTpFdbPort.2.108.156.237.41.204.205 = 531 dot1qTpFdbPort.2.120.254.61.15.112.164 = 524 dot1qTpFdbPort.2.124.32.100.230.236.203 = 527 dot1qTpFdbPort.2.136.224.243.40.30.1 = 516 dot1qTpFdbPort.2.136.224.243.122.196.100 = 543 dot1qTpFdbPort.2.136.224.243.125.121.193 = 547 dot1qTpFdbPort.2.172.75.200.65.55.205 = 2 dot1qTpFdbPort.2.196.100.19.201.3.32 = 538 dot1qTpFdbPort.2.196.100.19.206.141.48 = 563 dot1qTpFdbPort.2.248.192.1.216.148.136 = 549
{master:0}
dot1qTpFdbPort.2.0.12.41.22.23.195 = 567
bthompson@NWAX1-EX> show snmp mib walk dot1qTpFdbPort.2 | match 195 dot1qTpFdbPort.2.0.12.41.22.23.195 = 567
{master:0}
bthompson@NWAX1-EX> show snmp mib walk dot1dBasePortIfIndex | match 567 dot1dBasePortIfIndex.567 = 641
{master:0}
bthompson@NWAX1-EX> show snmp mib get ifName.641 ifName.641 = ge-0/2/2.0
{master:0}
@rowanthorpe What did you do with your Juniper. Can you share any hack that might get it to work for now?
@pdxmaverick I just refactored and rebased my pull-request from three weeks ago, to fit around @nickhilliard's latest changes, so you can see that updated in #116 (NB: that PR is against one of INEX's non-public, experimental, "do not track me" branches, so treat it as such). Check my latest comment there to see what exactly that latest version is (the normal diff is unreadable because of indentation changes, I included a diff -b
in the comment).
WARNING: I am not entirely sure that fix is "correct". I found it by trial-and-error and it superficially "seems" to give sane results for our Juniper EX4500, but I would feel uncomfortable about anyone (including me) relying on it until it is confirmed against the relevant Juniper specs to be a correct "fix" or not by someone more SNMP-savvy (...yourself?). As far as I understand SNMP, an snmpwalk shouldn't actually "change" anything though, so I guess the worst scenario is receiving subtly wrong data in the meantime (i.e. don't trust what you see entirely unless you verify the data yourself).
PS: I just noticed @barryo has switched on Travis support but that branch obviously doesn't have a .travis.yml file in place (as it existed before he switched Travis support on), so it is defaulting to trying to test Tags, Pull Requests, etc in "default mode" (i.e. as Ruby..!) so ignore those Failure: Travis CI build failed
symbols for now...
I just noticed @barryo has switched on Travis support
Or rather I am in the very early process of starting to set up Travis...
Poking, as our IX is ready to release IXP to members although still no sflow support.
This is in answer to the question asked of me at #116. I am answering it here though as it will probably help the flow of dialogue better, and might benefit anyone reading, by way of the context of the preceding comment thread.
@pdxmaverick I am happy to try to help (and we do use vlan/s here too), but that comes with a big caveat: I am not as up-to-speed on "routing" theory as the rest of you evidently are. I come from the devel side of things, and am trying to catch up with what subset of routing concepts I need to know for ixp-m... Anyway, I will try to cite each thing I claim to "understand" - even if it means linking to some painfully rudimentary principles - so you (and/or @nickhilliard) can fact-check me as I go.
Firstly, the best explanation I have found which sums up the ".0 mystery" is this explanation of the so-called "unit 0". As someone who hasn't acclimatised to pre-existing "best practises", I have to say it sounds like this approach has a certain elegance and consistency to it. It's just annoying that it seems so incompatible with the others...
From reading the following links - A, B, C, and D, including some useful comments on link D - I think I can summarise as follows (please correct me if I've misunderstood): On an EX each port seems to always have at least one "vlan id" (or "unit"), and can either be in access (untagged) mode or trunk (tagged) mode. In access mode it receives/transmits untagged ethernet frames, has the single "vlan id 0" but is treated as having "no vlan id" (although specific, limited "multi vlan" behaviour can be achieved using things like lldp-med, though). Conversely, in trunk mode, multi vlans are possible, all received/transmitted frames must be tagged ".1q" frames, except for potentially one of the vlan ids which can be marked as the "native vlan" id, distinct from the others.
Apparently "trunk" ports are only for inter-switch/router communication, and we are contacting the switch from a server looking only for the "access ports" in order to find their (level 2) macs. If I have assumed that correctly, then we should expect that the "raw untagged" port we are interested in will by definition always include "unit 0" (".0") tagged on the end, for sending and receiving. This means to be compatible with existing code this script should always add a ".0" to the port it reports to the switch, and always chop ".0" off the end of what it receives as the reported port. The Pull Request I already have open does the chopping of ".0"s for finding and then pushing values into the macaddress db-table in a form similar to that of the other brands (note that the "$ports" dumped debug data has no trailing ".0"s). After running the script you could visually check the resulting ixp-m database table against your own list of expected macs/ports by doing:
USE [ixpm-db-name];
SELECT * FROM macaddress;
I guess then for presentation purposes the strictly correct thing would be for anything drawing data from the macaddress table to also re-add the ".0"s (and more importantly, any php code which might - in future? - need to rely on the correct port-to-mac mappings for actually computing further data would have to re-add the ".0"s, not just for cosmetic reasons... maybe @barryo will know more about that).
So as far as I can see (and as deeply as I can research without drowning in a sea of unfamiliar jargon), it seems my fix does what is needed to get an l2-ports-to-macs mapping from an EX into the database in a form which fits sanely with the present ixp-m datamodel. As for whether the ixp-m datamodel itself needs tweaking in order to adequately accomodate Juniper's model is an ixp-m design-decision, which is not my domain ;-)
I hope that helps, and I hope I haven't made any wildly wrong assumptions along the way...
@rowanthorpe the .0 convention is a convention not a rule. It could in theory be any number and I haven't found the correct OIDs to figure out which is the correct unit number. Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.
I've taken your and @pdxmaverick's suggestions and merged this in with a bunch of other things into the code. Can you check out https://github.com/inex/IXP-Manager/blob/0848952b8f041188e4379f5f85c75fe8a80fb8bc/tools/runtime/l2database/update-l2database.pl and see if this fixes this problem?
The snmpwalk is already vlan specific. If the port is in access mode, it would be 0 or the vlan you already specified. @nick said he could fix this in 20 mins. I have provided access. On Jan 26, 2014 10:55 AM, "Nick Hilliard" notifications@github.com wrote:
@rowanthorpe https://github.com/rowanthorpe the .0 convention is a convention not a rule. It could in theory be any number and I haven't found the correct OIDs to figure out which is the correct unit number. Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.
I've taken your and @pdxmaverick https://github.com/pdxmaverick's suggestions and merged this in with a bunch of other things into the code. Can you check out https://github.com/inex/IXP-Manager/blob/0848952b8f041188e4379f5f85c75fe8a80fb8bc/tools/runtime/l2database/update-l2database.pland see if this fixes this problem?
— Reply to this email directly or view it on GitHubhttps://github.com/inex/IXP-Manager/issues/104#issuecomment-33326359 .
@nickhilliard I will test the new summarised code when I can get switch-access (and time!) in the coming day(..s) and will let you know.
<side_note>
I realise - and realised the previous time you mentioned - that it is a convention and not a rule, and just observed that on reflection it seems not a totally insane convention, granted that it allows one to parse SNMP in a truly consistent hierarchical manner, rather than requiring WET logic-flows like this:
if $is_physical_intf; then
thismethod
else
if $is_logical_intf; then
if $is_native_intf; then
othermethod
else
yetanothermethod
fi
fi
fi
...albeit at the cost of compatibility with other brands' conventions - which somewhat defeats the point, though. ;-)
Actually I don't think it really matters because if there is no vlan specified, the assumption is that one is talking about the physical interface anyway, in which case whatever number is present is stripped off.
In my long-winded and probably-using-all-the-wrong-terms-and-conflating-concepts kind of way, that is what I was saying too... I am glad I got that right in my own head - even if I failed at communicating as much.
</side_note>
EDIT: Just for reference, in case it ever impacts on future issues too, I just remembered what I told you in a previous thread a while ago - that one of our router admins tested it and found that the EXs don't even allow any unit other than "unit 0" to be the access port.
I was just now able to briefly talk with a router admin here who clarified/confirmed some things for me. I will summarise below what was discussed, even if much of it is stating what is already known:
Therefore this confirms my previous guess - which is that the Right Way™
to syntactically interoperate between JunOS's naming convention and IXP-M/everyone-else's naming convention when talking SNMP to switch ports is to always trim ".0" from the received port-names and to always append ".0" to the transmitted port names...
PS: Apparently this naming convention is common to all Junipers, not just the EXs...
@barryo I have tested your latest code. Not sure why yet but it is not working like my pull request to @rowanthorpe fork.
I am very confused on who's fork to pull from and have spent many hours trying to move from branch to branch. Ultimately failing. I have to resort to cut and paste.
Still no joy https://gist.github.com/pdxmaverick/8701695 is my output.
Argh!!!! retract, my own confusion causing false negatives.
Yes the code is working. Could you merge it in to master so I can GIT back on track :)
@nickhilliard - this is really your call as I wasn't following here. @pdxmaverick is asking that @rowanthorpe's pull request #116 be merged. It cannot be merged automatically (probably conflicts). If you're happy for it to be merged I can do the leg work on merging it. I also reopened #116.
@barryo If I'm not mistaken I think @pdxmaverick said that in the end @nickhilliard's latest commit worked for him too...
Argh!!!! retract... [snip] Yes the code is working... [snip]
Nick's latest commit incorporates elements from both my and Brian's Pull Requests - so is probably the better code to stick with). Is that correct? I haven't yet had a chance to test the code myself. If Brian confirms that Nick's code works then I think Nick's intention is to close both Pull Requests, and go with what he's already committed.
BTW: in the interests of confirming/comparing sane functionality and possibly for future reference with any issues php might have with this stuff, a colleague here pointed out that Observium seems to handle it all correctly. If you go to look at their code though, just know that they use a modified QPL license - OSI approved but I haven't read it closely regarding GPL-compatibility...
lol@observium. No, not going there, even to take a look. @rowanthorpe, if you can test this code out, I'll merge it back into master. I think it should work, but don't have enough glue in place to test it out on a live system.
cannot read BRIDGE-MIB or Q-BRIDGE-MIB from switch ex4500
I know we have discussed this, I wanted to have an open ticket so it doesn't fall in the cracks.
Issue with how SNMP reports interface name xe-0/0/0.0