Closed ChristianKniep closed 12 years ago
On Tue, Jul 10, 2012 at 08:41:08AM -0700, Christian Kniep wrote:
For debugging reasons I changed discovery.py a bit:
- print "#### sbn.nodes"
- for k,v in sbn.nodes.items():
- print k,v
- print "#### \sbn.nodes"
This diagnostic is OK..
sbn = lib.get_subnet(sched,());
- print "#### sbn.nodes"
- print sbn.nodes
- print "#### \sbn.nodes" sched.run(queue=rdma.discovery.subnet_get_port(sched,sbn,lib.path)); port = sbn.path_to_port(lib.path); if not isinstance(port.parent,rdma.subnet.Switch):
This is printed too soon, it will always be {}.
root@emv111 python-rdma # ibtool iblinkinfo --up -L 1
sbn.nodes
{}
\sbn.nodes
E: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPPortInfo(21) got error status 0x1c - Invalid attr or modifier
If you run with -vv -dd you should get a backtrace for this error, and a MAD dump. There is a good chance this is related to the bug below:
root@emv111 python-rdma # ibtool iblinkinfo --up -L 2
sbn.nodes
{}
\sbn.nodes
Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' Traceback (most recent call last): File "/usr/local/bin/ibtool", line 93, in
if not func(argv,o): File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 402, in cmd_iblinkinfo print_switch(sbn,args,port.parent); File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 333, in print_switch if better_possible(pinf.linkWidthSupported,peer_port.pinf.linkWidthSupported, AttributeError: 'NoneType' object has no attribute 'linkWidthSupported'
This is because one of the scanning routines did not fill in pinf.. This chunk was supposed to fill all required pinfs:
peer_ports = [(sbn.topology.get(I),idx) for I,idx in port.parent.iterports()];
sched.run(mqueue=(rdma.discovery.subnet_pinf_SMP(sched,sbn,idx,sbn.get_path_smp(sched,I.to_end_port()))
for I,idx in peer_ports if I is not None and I.pinf is None));
But looking at it, the construction of peer_ports is wrong, the association of idx with the result of topology.get is completely bogus.
This is better:
peer_ports = set(I for I,Idx in port.parent.iterports());
peer_ports.update(peer for peer,prior in sbn.iterpeers(port.parent));
sched.run(mqueue=(rdma.discovery.subnet_pinf_SMP(sched,sbn,I.port_id,sbn.get_path_smp(sched,I.to_end_port()))
for I in peer_ports if I is not None and I.pinf is None));
If that works for you please make another commit..
Jason
Hey,
ok, fixed it...
Thx Christian
For debugging reasons I changed discovery.py a bit:
diff --git a/libibtool/discovery.py b/libibtool/discovery.py index b88f54a..3708a62 100644 --- a/libibtool/discovery.py +++ b/libibtool/discovery.py @@ -378,12 +378,19 @@ def cmd_iblinkinfo(argv,o): "all_NodeDescription", "all_PortInfo", "all_topology"]);
If I fire up iblinkinfo without arguments I got all the nodes within the subnet:
root@emv111 python-rdma # ibtool iblinkinfo --up
sbn.nodes
0002:c902:0044:e890 <rdma.subnet.Switch object at 0x190a5d0> 0008:f104:0399:09ec <rdma.subnet.CA object at 0x190a7d0> 0008:f104:0399:0a64 <rdma.subnet.CA object at 0x190a350> 0008:f104:0399:0980 <rdma.subnet.CA object at 0x190a850> 0008:f104:0399:0944 <rdma.subnet.CA object at 0x190f290> 0008:f104:0399:0ab0 <rdma.subnet.CA object at 0x190aa90> 0008:f104:0399:01d4 <rdma.subnet.CA object at 0x190f450> 0008:f104:0399:0a98 <rdma.subnet.CA object at 0x190ae10> 0008:f104:0041:27bc <rdma.subnet.Switch object at 0x190a6d0> 0008:f104:0399:0a7c <rdma.subnet.CA object at 0x190ad10>
\sbn.nodes
Switch 0008:f104:0041:27bc 'ISR9024S-M Voltaire': 1 1[ ] ==( 4x SDR Active/Link UP) ==> 4 1[ ] 'butragueno HCA-1' 1 2[ ] ==( 4x SDR Active/Link UP) ==> 6 1[ ] 'puskas HCA-1' 1 8[ ] ==( 4x SDR Active/Link UP) ==> 9 1[ ] 'emv107 HCA-1' 1 9[ ] ==( 4x SDR Active/Link UP) ==> 10 1[ ] 'emv109 HCA-1' 1 10[ ] ==( 4x SDR Active/Link UP) ==> 7 1[ ] 'emv110 HCA-1' 1 11[ ] ==( 1x SDR Active/Link UP) ==> 3 1[ ] 'emv111 HCA-1' 1 23[ ] ==( 4x SDR Active/Link UP) ==> 2 7[ ] 'Infiniscale-IV Mellanox Technologies' 1 24[ ] ==( 4x SDR Active/Link UP) ==> 2 8[ ] 'Infiniscale-IV Mellanox Technologies' Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' 2 7[ ] ==( 4x SDR Active/Link UP) ==> 1 23[ ] 'ISR9024S-M Voltaire' 2 8[ ] ==( 4x SDR Active/Link UP) ==> 1 24[ ] 'ISR9024S-M Voltaire' root@emv111 python-rdma #
With options like LID (or GUID or DirectPath) I got no nodes at all within the subnet.
root@emv111 python-rdma # ibtool iblinkinfo --up -L 1
sbn.nodes
{}
\sbn.nodes
E: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPPortInfo(21) got error status 0x1c - Invalid attr or modifier
root@emv111 python-rdma # ibtool iblinkinfo --up -L 2
sbn.nodes
{}
\sbn.nodes
Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' Traceback (most recent call last): File "/usr/local/bin/ibtool", line 93, in
if not func(argv,o):
File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 402, in cmd_iblinkinfo
print_switch(sbn,args,port.parent);
File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 333, in print_switch
if better_possible(pinf.linkWidthSupported,peer_port.pinf.linkWidthSupported,
AttributeError: 'NoneType' object has no attribute 'linkWidthSupported'
I have not found the code fragment that fills the list and cause that error.
Cheers Christian