zrlio / disni

DiSNI: Direct Storage and Networking Interface
Apache License 2.0
186 stars 66 forks source link

using rdma write to specified offset position of the remote buffer #59

Open ShiningChuang opened 1 year ago

ShiningChuang commented 1 year ago

When I use one-sided operations RDMA write, I attempted to perform partial writes on the buffer corresponding to the remote endpoint.

The code on the remote side is as follows:

ByteBuffer sendBuf = endpoint.getSendBuf();
IbvMr dataMr = endpoint.getDataMr();
sendBuf.putLong(dataMr.getAddr());
sendBuf.putInt(dataMr.getLength());
sendBuf.putInt(dataMr.getLkey());
sendBuf.clear();
endpoint.postSendExecute();
endpoint.takeEvent();
ByteBuffer dataBuf = endpoint.getDataBuf();
dataBuf.clear();
endpoint.postRecvExecute();
endpoint.takeEvent();
System.out.println("WriteServer::write from client 1: " + dataBuf.get());
System.out.println("WriteServer::write from client 2: " + dataBuf.get());
System.out.println("WriteServer::write from client 3: " + dataBuf.get());

The code on the local side is as follows:

endpoint.pollUntil();
ByteBuffer recvBuf = endpoint.getRecvBuf();
recvBuf.clear();
long addr = recvBuf.getLong();
int length = recvBuf.getInt();
int lkey = recvBuf.getInt();
recvBuf.clear();
System.out.println("WriteClient, receiving rdma information, addr " + addr + ", length " + length + ", lkey " + lkey + ", rkey " + rkey);
System.out.println("WriteClient, preparing read operation...");

IbvSendWR dataWR = endpoint.getDataWR();
dataWR.setWr_id(1001);
dataWR.setOpcode(IbvSendWR.IBV_WR_RDMA_WRITE);
dataWR.setSend_flags(IbvSendWR.IBV_SEND_SIGNALED);
dataWR.getRdma().setRemote_addr(addr);
dataWR.getRdma().setRkey(lkey);

ByteBuffer dataBuf = endpoint.getDataBuf();
dataBuf.clear();
dataBuf.put((byte)5);
endpoint.postDataExecute();
endpoint.pollUntil();

it's ok, remote side can read "5 0 0" from data buffer, write success. But when I change dataWR.getRdma().setRemote_addr(addr) to dataWR.getRdma().setRemote_addr(addr+1) in local side, I thought remote side can read "0 5 0", it didn't, it read "0 0 0", meaning write failed.

I want to know what went wrong, why the local cannot write data to the specified offset position of the remote buffer?

PepperJo commented 1 year ago

I see a few problems with your code. On the remote side:

  1. You should always send the Rkey and not the Lkey to a remote side (although for most devices these keys are identical)
  2. You should clear the dataBuf before you send the buffer information (address, length, rkey) to the remote side. Otherwise, the remote write could happen just before you clear the buffer.
  3. The post receive is not necessary here. In fact it will not do anything. You only need to post receive buffers when you are using SEND at the remote side but for WRITE you don't need to post a receive buffer. If you want to use SEND, you need to make sure that the receive buffer is posted before you send the buffer information. Otherwise, it can happen that the remote side issues a send but there is no buffer posted yet.
  4. You assume that after you got the receive event the data should be in the buffer but the (first) event only tells you that the receive buffer has been posted and you will never receive an event that you received data into this buffer because as explained above you are not using SEND on the remote side.

The local side looks ok just be aware that the event you get after a successful WRITE only indicates that the WRITE has been successfully sent to the remote side but it doesn't necessarily mean it has completed on the remote side (meaning placed into memory).

You should try to understand the difference between one-sided (WRITE) and two-sided operations (SEND), so that you can see which operation makes the most sense in your case. Generally speaking one-sided operations are faster but harder to use.

ShiningChuang commented 1 year ago

Thanks for your correction! I will explain based on your point.

  1. I know I should send rKey, but this code was rewritten based on src/test/java/com/ibm/disni/benchmarks/ReadServer.java, which write sendBuf.putInt(dataMr.getLkey()) in line 161. Considering rkey and lkey are equal and this is not the main problem, so I ignored the problem.
  2. This does bring about the problem of "remote write could happen just before I clear the buffer". I will pay attention to this point later, but this is also not the main problem.
  3. I know that post reception is unnecessary for WRITE. I just want the sender to write the data and then notify the receiver through SEND. Because I'm worried that receiver starting to read before the sender writes. Maybe it’s not something to worry about? Also, this is not the main problem.
  4. I understand this point.

But these all seems are not the main problem causing receiver can't read any data when I change dataWR.getRdma().setRemote_addr(addr) to dataWR.getRdma().setRemote_addr(addr+1).

I originally thought that rkey only had access rights to the first address of the remote data buffer, so when I change the address to addr+1, I can't access it using rkey and I should access it using the key of addr+1. But after checking the rdma information, I found that rkey should have access rights to the addresses in the entire area [address, address+len), so this was not the problem.

Then I suspect that the memory space of the data buffer on the remote side is not continuous, and addr+1 does not correspond to dataBuf.position(1). But I checked that the dataBuf is allocated through ByteBuffer.allocateDirect(), and it should be a continuous memory. I checked the continuity of this address through Unsafe, and it was indeed ok.

So I don't understand what went wrong. I think this is not a problem with my use of WRITE, because addr can read data, just addr+1 cannot

PepperJo commented 1 year ago

Ok, it seems you addressed most of my issues raised. When you do increase the remote address do you also decrease the length by 1? Otherwise you might try to write outside the area? It seems you checked this already given your statement above. Do you check each completion queue result if it was successful? If you get an error there that might be easier to debug.

ShiningChuang commented 1 year ago

Oh, that's exactly what I missed, the main problem! The length. sgeSend.setLength() is set to a fixed buffer length in init().

Thanks a lot! :)

PepperJo commented 1 year ago

No Problem. As mentioned above, I would always check the completion queue results to see if the command actually succeeded.