Closed faramir closed 4 years ago
Attaching output and modified files: vbox-101.txt vbox-102.txt RDMAvsTcpBenchmarkServer.java.txt SendRecvClient.java.txt SendRecvServer.java.txt RDMAvsTcpBenchmarkClient.java.txt
@petro-rudenko Can you take a look at this?
Interesting - reproduce on my environment. So overall barrier is working - it sending after receiving. The issue with stall data on some iterations:
RDMA: 0 < 301
recv 0
send 0
RDMA: 1 < 301
recv 0
send -1
RDMA: 2 < 301
recv 2
send -2
Checking.
@faramir thanks, can you please call postRecv.execute()
after endpoint.getWcEvents().take();
like in #38 PR. With it got consistent result.
Thank you @petro-rudenko . Results are consistent now.
However, I'm still facing problem with the occasional deadlock. Eg. server sends -1195 and waits for receiving message by client, and the client sends 1196 and waits for receiving message by server... Are you able to reproduce also this issue?
/* Server */
for (int i = 0; i < loopCount + 1; i++) {
// Recv PING
clientEndpoint.getWcEvents().take();
postRecv.execute();
System.out.println("recv\t" + recvBuf.asIntBuffer().get(0));
recvBuf.clear();
//Send PONG
sendBuf.asIntBuffer().put(0, -i);
System.out.println("send\t" + sendBuf.asIntBuffer().get(0));
postSend.execute();
clientEndpoint.getWcEvents().take();
sendBuf.clear();
}
/* Client */
for (int i = 0; i < loopCount + 1; i++) {
// Send PING
sendBuf.asIntBuffer().put(0, i);
sendBuf.clear();
System.out.println("send\t" + sendBuf.asIntBuffer().get(0));
postSend.execute();
endpoint.getWcEvents().take();
// Recv PONG
endpoint.getWcEvents().take();
postRecv.execute();
System.out.println("recv\t" + recvBuf.asIntBuffer().get(0));
recvBuf.clear();
}
server$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarksDMAvsTcpBenchmarkServer -a 192.168.56.101 -k 3000
client$ java -cp disni-2.0-jar-with-dependencies.jar:disni-tests.jar com.ibm.disni.benchmarksDMAvsTcpBenchmarkClient -a 192.168.56.101 -k 3000
Example output files atteched. client-102.txt server-101.txt
I haven't looked at the RdmavsTcpBenchmark in too much detail, but a quick glance made me wonder if there is not a bug with the receive posting. I think the receive at the client should always be posted prior to sending a request. Moving the receive posting just before reaping the event it not enough, the response might already have come back.
What I have just seen (in the output client-102.txt and server-101.txt): on the server, just before freeze, it "received" two times the same value:
RDMA: 1193 < 3001
recv 1193
send -1193
RDMA: 1194 < 3001
recv 1194
send -1194
RDMA: 1195 < 3001
recv 1194
send -1195
while on the client it sends the right values:
RDMA: 1193 < 3001
send 1193
recv -1193
RDMA: 1194 < 3001
send 1194
recv -1194
RDMA: 1195 < 3001
send 1195
recv -1194
RDMA: 1196 < 3001
send 1196
One of the interesting recording of this problem is available here: https://youtu.be/QSt9wZqkloo (after a while, just before freezing, client received next notification and go one step forward).
For me doesn't reproduce, but yes try as @patrickstuedi suggested. Client:
for (int i = 0; i < loopCount + 1; i++) {
System.err.println("RDMA: " + i + " < " + (loopCount + 1));
postRecv.execute();
// Send PING
sendBuf.asIntBuffer().put(0, i);
sendBuf.clear();
postSend.execute();
System.out.println("send\t" + sendBuf.asIntBuffer().get(0));
wc = endpoint.getWcEvents().take();
// Recv PONG
wc = endpoint.getWcEvents().take();
System.out.println("recv\t" + recvBuf.asIntBuffer().get(0));
recvBuf.clear();
}
I tried it. Unfortunately, the freezing is still there.
I decided to do simpler test - Client sends only messages to Server, Server receives only messages. That test also lead to deadlock.
/* Client */
for (int i = 0; i < loopCount + 1; i++) {
System.err.println("RDMA: " + i + " < " + (loopCount + 1));
// Send PING
sendBuf.asIntBuffer().put(0, i);
sendBuf.clear();
System.out.println("send\t" + sendBuf.asIntBuffer().get(0));
postSend.execute();
endpoint.getWcEvents().take();
}
/* Server */
for (int i = 0; i < loopCount + 1; i++) {
System.err.println("RDMA: " + i + " < " + (loopCount + 1));
// Recv PING
postRecv.execute();
clientEndpoint.getWcEvents().take();
System.out.println("recv\t" + recvBuf.asIntBuffer().get(0));
recvBuf.clear();
}
I've tried to invoke clientEndpoint.postRecv(clientEndpoint.getWrList_recv())
instead of using postRecv
variable every time in the loop, and to invoke free()
after execution, but all those attempts do not help - freeze is still there.
@faramir just watch this talk about rxe / SoftRoce. They had similar issues to yours, where state machine in rxe could just stuck. Long story short:
Is there any size limit ?
when I run RDMAvsTcpBenchmarkClient -a IP -k 3000 -s 16384, RDMA result: Total time: 1363.831266 ms Bidirectional bandwidth: 0.06712907722339899 Gb/s Bidirectional average latency: 0.454610422 ms TCP result: Total time: 1800.555879 ms Bidirectional bandwidth: 0.05084692757541461 Gb/s Bidirectional average latency: 0.600185293 ms
when I run RDMAvsTcpBenchmarkClient -a IP -k 3000 -s 32768, blocked at RDMAvsTcpBenchmarkClient::client channel set up
RDMAvsTcpBenchmarkServer -a IP -k 3000 -s 32768 blocked at RDMAvsTcpBenchmarkClient::client channel set up
PS: Linux 5.0.2 soft-roce
Hello,
After a hard and long time spent trying to compile Soft-iWARP (without success), I found a module RXE (kernel: 4.9.0-8-amd64) and Soft-RoCE that works without any problems (sudo apt-get -t stretch-backports install rdma-core ibverbs-providers ibverbs-utils libibverbs-dev librdmacm-dev).
I've successfully compiled and installed DiSNI on the virtual machines (Debian on VirtualBox).
The basic, simple example (com.ibm.disni.examples.SendRecv) works like a charm. However, I have problem with com.ibm.disni.benchmarks.RDMAvsTcpBenchmark
Sometimes it finishes as it should, sometimes it hangs. I've added some diagnostic outputs in the classes, and sometimes it hangs after 2000 iterations of the RDMA loop, sometimes after 1942 iterations and so on. Then I have also changed the value that is being send. Client sends current iteration number, server sends the minus current iteration number. The received value is not always the next expected value (eg. 0, 1, 1, 3, 3, 4, ... or -1, -2, -4, -5, -5, -6, -7, -8, -9, ...)
I does not have a clue, where I should look what causes the problem so I'm unable to find a solution.
Best regards, Marek
---cut---
...