zrlio / disni

DiSNI: Direct Storage and Networking Interface
Apache License 2.0
186 stars 66 forks source link

"setting up protection domain, no context found" error when running the examples #28

Closed ksonbol closed 6 years ago

ksonbol commented 6 years ago

I am using disni-1.6, SoftiWARP (dev branch), linux kernel 4.13.0-43. When I run the ReadServer example as follows:

cd target java -Djava.library.path=/usr/local/lib -cp "*" com.ibm.disni.examples.ReadServer -a localhost

I get the following error:

log4j:WARN No appenders could be found for logger (com.ibm.disni). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: setting up protection domain, no context found at com.ibm.disni.rdma.RdmaEndpointProvider.createProtectionDomain(RdmaEndpointProvider.java:73) at com.ibm.disni.rdma.RdmaEndpointProvider.createProtectionDomain(RdmaEndpointProvider.java:58) at com.ibm.disni.rdma.RdmaEndpointGroup.createProtectionDomain(RdmaEndpointGroup.java:77) at com.ibm.disni.rdma.RdmaEndpointGroup.createProtectionDomainRaw(RdmaEndpointGroup.java:202) at com.ibm.disni.rdma.RdmaServerEndpoint.bind(RdmaServerEndpoint.java:90) at com.ibm.disni.examples.ReadServer.run(ReadServer.java:58) at com.ibm.disni.examples.ReadServer.launch(ReadServer.java:105) at com.ibm.disni.examples.ReadServer.main(ReadServer.java:110)

I get the same error if i use 127.0.0.1 intead of localhost. Following the discussion here, I used my Ethernet IP instead of the loopback address, but then I got a "binding server address failed" error:

log4j:WARN No appenders could be found for logger (com.ibm.disni). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: binding server address /192.168.1.106:1919, failed at com.ibm.disni.rdma.RdmaServerEndpoint.bind(RdmaServerEndpoint.java:85) at com.ibm.disni.examples.ReadServer.run(ReadServer.java:58) at com.ibm.disni.examples.ReadServer.launch(ReadServer.java:105) at com.ibm.disni.examples.ReadServer.main(ReadServer.java:110)

Changing the port number didn't help either. The root of the problem seems to be that rdmaServerEndpoint.getIdPriv().getVerbs() returns null but I couldn't figure out why.

Output of ibv_devices:

device                 node GUID
------              ----------------
siw_lo              7369775f6c6f0000

ibv_devinfo:

hca_id: siw_lo
    transport:          iWARP (1)
    fw_ver:             0.0.0
    node_guid:          7369:775f:6c6f:0000
    sys_image_guid:         0000:0000:0000:0000
    vendor_id:          0x626d74
    vendor_part_id:         0
    hw_ver:             0x0
    phys_port_cnt:          1
        port:   1
            state:          PORT_ACTIVE (4)
            max_mtu:        4096 (5)
            active_mtu:     4096 (5)
            sm_lid:         0
            port_lid:       0
            port_lmc:       0x00
            link_layer:     Ethernet
animeshtrivedi commented 6 years ago

So, it seems like that siw is only attached to your loopback device (lo). Which NIC has the IP 192.168.1.106? SoftiWARP needs to attach to that NIC for your tests to run.

ksonbol commented 6 years ago

Thanks @animeshtrivedi , that was helpful. The problem was that to load the kernel modules at boot time, I had created a .conf file under /etc/modules-load.d with the contents: rdma_cm ib_uverbs rdma_ucm siw

However, that was not working for some reason. I removed the file and now It works when i load them manually with modprobe. I have also created a systemd service (I am using Ubuntu 16.04) to give another shot at loading the modules at boot but it only works when I manually restart the service after startup.

Note: by worked I mean two entries appeared for ibv_devices, one for loopback and one for my NIC, and the Read Example works fine.

BernardMetzler commented 6 years ago

Most likely, the IP for the interface was not yet configured when siw got loaded. Current siw attaches only to fully configured interfaces, and only during module load time.

Thanks, Bernard.

"Karim Sonbol" notifications@github.com wrote on 08/27/2018 16:42:37:

From: "Karim Sonbol" notifications@github.com To: "zrlio/disni" disni@noreply.github.com Cc: "Subscribed" subscribed@noreply.github.com Date: 08/27/2018 16:42 Subject: Re: [zrlio/disni] "setting up protection domain, no context found" error when running the examples (#28)

Thanks @animeshtrivedi , that was helpful. The problem was that to load the kernel modules at boot time, I had created a .conf file under /etc/modules-load.d with the contents: rdma_cm ib_uverbs rdma_ucm siw However, that was not working for some reason. I removed the file and now It works when i load them manually with modprobe. I have also created a systemd service (I am using Ubuntu 16.04) to give another shot at loading the modules at boot but it only works when I manually restart the service after startup. Note: by worked I mean two entries appeared for ibv_devices, one for loopback and one for my NIC, and the Read Example works fine. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread. [image removed]

ksonbol commented 6 years ago

Thank you Bernard, that explains it well. I will write the steps for creating the systemd service in case anyone else wants to use this method to load the modules:

  1. Create the script with a useful name. sudo nano /usr/bin/softiwarp
  2. Fill it with these contents:
#!/bin/sh

start() {
  modprobe rdma_cm
  modprobe ib_uverbs
  modprobe rdma_ucm
  modprobe siw
}

stop() {
  modprobe -r rdma_cm
  modprobe -r ib_uverbs
  modprobe -r rdma_ucm
  modprobe -r siw
}

case $1 in
  start|stop) "$1" ;;
esac
  1. Create the service file. sudo nano /etc/systemd/system/softiwarp.service
  2. Fill it with these contents:
    
    [Unit]
    Description=Load RDMA and SoftiWARP modules

[Service] Type=oneshot ExecStart=/usr/bin/softiwarp start ExecStop=/usr/bin/softiwarp stop RemainAfterExit=yes

[Install] WantedBy=multi-user.target


5. Enable the service.
`sudo systemctl enable softiwarp.service`
6. After restart the service will be started, automatically loading the modules, but if they only get attached to the loopback address, like in my case, restarting the service would attach it correctly to the NIC.
`sudo service softiwarp restart `.