jgunthorpe / python-rdma

Python interface to the Linux RDMA stack
https://jgunthorpe.github.io/python-rdma/
Other
108 stars 31 forks source link

sbn is emtpy when not use global view of subnet #4

Closed ChristianKniep closed 12 years ago

ChristianKniep commented 12 years ago

For debugging reasons I changed discovery.py a bit:

diff --git a/libibtool/discovery.py b/libibtool/discovery.py index b88f54a..3708a62 100644 --- a/libibtool/discovery.py +++ b/libibtool/discovery.py @@ -378,12 +378,19 @@ def cmd_iblinkinfo(argv,o): "all_NodeDescription", "all_PortInfo", "all_topology"]);

If I fire up iblinkinfo without arguments I got all the nodes within the subnet:

root@emv111 python-rdma # ibtool iblinkinfo --up

sbn.nodes

0002:c902:0044:e890 <rdma.subnet.Switch object at 0x190a5d0> 0008:f104:0399:09ec <rdma.subnet.CA object at 0x190a7d0> 0008:f104:0399:0a64 <rdma.subnet.CA object at 0x190a350> 0008:f104:0399:0980 <rdma.subnet.CA object at 0x190a850> 0008:f104:0399:0944 <rdma.subnet.CA object at 0x190f290> 0008:f104:0399:0ab0 <rdma.subnet.CA object at 0x190aa90> 0008:f104:0399:01d4 <rdma.subnet.CA object at 0x190f450> 0008:f104:0399:0a98 <rdma.subnet.CA object at 0x190ae10> 0008:f104:0041:27bc <rdma.subnet.Switch object at 0x190a6d0> 0008:f104:0399:0a7c <rdma.subnet.CA object at 0x190ad10>

\sbn.nodes

Switch 0008:f104:0041:27bc 'ISR9024S-M Voltaire': 1 1[ ] ==( 4x SDR Active/Link UP) ==> 4 1[ ] 'butragueno HCA-1' 1 2[ ] ==( 4x SDR Active/Link UP) ==> 6 1[ ] 'puskas HCA-1' 1 8[ ] ==( 4x SDR Active/Link UP) ==> 9 1[ ] 'emv107 HCA-1' 1 9[ ] ==( 4x SDR Active/Link UP) ==> 10 1[ ] 'emv109 HCA-1' 1 10[ ] ==( 4x SDR Active/Link UP) ==> 7 1[ ] 'emv110 HCA-1' 1 11[ ] ==( 1x SDR Active/Link UP) ==> 3 1[ ] 'emv111 HCA-1' 1 23[ ] ==( 4x SDR Active/Link UP) ==> 2 7[ ] 'Infiniscale-IV Mellanox Technologies' 1 24[ ] ==( 4x SDR Active/Link UP) ==> 2 8[ ] 'Infiniscale-IV Mellanox Technologies' Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' 2 7[ ] ==( 4x SDR Active/Link UP) ==> 1 23[ ] 'ISR9024S-M Voltaire' 2 8[ ] ==( 4x SDR Active/Link UP) ==> 1 24[ ] 'ISR9024S-M Voltaire' root@emv111 python-rdma #

With options like LID (or GUID or DirectPath) I got no nodes at all within the subnet.

root@emv111 python-rdma # ibtool iblinkinfo --up -L 1

sbn.nodes

{}

\sbn.nodes

E: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPPortInfo(21) got error status 0x1c - Invalid attr or modifier

root@emv111 python-rdma # ibtool iblinkinfo --up -L 2

sbn.nodes

{}

\sbn.nodes

Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' Traceback (most recent call last): File "/usr/local/bin/ibtool", line 93, in if not func(argv,o): File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 402, in cmd_iblinkinfo print_switch(sbn,args,port.parent); File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 333, in print_switch if better_possible(pinf.linkWidthSupported,peer_port.pinf.linkWidthSupported, AttributeError: 'NoneType' object has no attribute 'linkWidthSupported'

I have not found the code fragment that fills the list and cause that error.

Cheers Christian

jgunthorpe commented 12 years ago

On Tue, Jul 10, 2012 at 08:41:08AM -0700, Christian Kniep wrote:

For debugging reasons I changed discovery.py a bit:

  • print "#### sbn.nodes"
  • for k,v in sbn.nodes.items():
  • print k,v
  • print "#### \sbn.nodes"

This diagnostic is OK..

         sbn = lib.get_subnet(sched,());
  • print "#### sbn.nodes"
  • print sbn.nodes
  • print "#### \sbn.nodes" sched.run(queue=rdma.discovery.subnet_get_port(sched,sbn,lib.path)); port = sbn.path_to_port(lib.path); if not isinstance(port.parent,rdma.subnet.Switch):

This is printed too soon, it will always be {}.

root@emv111 python-rdma # ibtool iblinkinfo --up -L 1

sbn.nodes

{}

\sbn.nodes

E: RPC MAD_METHOD_GET(1) SMPFormat(1.1) SMPPortInfo(21) got error status 0x1c - Invalid attr or modifier

If you run with -vv -dd you should get a backtrace for this error, and a MAD dump. There is a good chance this is related to the bug below:

root@emv111 python-rdma # ibtool iblinkinfo --up -L 2

sbn.nodes

{}

\sbn.nodes

Switch 0002:c902:0044:e890 'Infiniscale-IV Mellanox Technologies': 2 1[ ] ==( 4x SDR Active/Link UP) ==> 8 1[ ] 'emv108 HCA-1' 2 2[ ] ==( 4x SDR Active/Link UP) ==> 5 1[ ] 'emv104 HCA-1' Traceback (most recent call last): File "/usr/local/bin/ibtool", line 93, in if not func(argv,o): File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 402, in cmd_iblinkinfo print_switch(sbn,args,port.parent); File "/usr/lib64/python2.6/site-packages/libibtool/discovery.py", line 333, in print_switch if better_possible(pinf.linkWidthSupported,peer_port.pinf.linkWidthSupported, AttributeError: 'NoneType' object has no attribute 'linkWidthSupported'

This is because one of the scanning routines did not fill in pinf.. This chunk was supposed to fill all required pinfs:

        peer_ports = [(sbn.topology.get(I),idx) for I,idx in port.parent.iterports()];
        sched.run(mqueue=(rdma.discovery.subnet_pinf_SMP(sched,sbn,idx,sbn.get_path_smp(sched,I.to_end_port()))
                          for I,idx in peer_ports if I is not None and I.pinf is None));

But looking at it, the construction of peer_ports is wrong, the association of idx with the result of topology.get is completely bogus.

This is better:

 peer_ports = set(I for I,Idx in port.parent.iterports());
 peer_ports.update(peer for peer,prior in sbn.iterpeers(port.parent));
 sched.run(mqueue=(rdma.discovery.subnet_pinf_SMP(sched,sbn,I.port_id,sbn.get_path_smp(sched,I.to_end_port()))
                          for I in peer_ports if I is not None and I.pinf is None));

If that works for you please make another commit..

Jason

ChristianKniep commented 12 years ago

Hey,

ok, fixed it...

Thx Christian