Open ShiningChuang opened 1 year ago
I see a few problems with your code. On the remote side:
The local side looks ok just be aware that the event you get after a successful WRITE only indicates that the WRITE has been successfully sent to the remote side but it doesn't necessarily mean it has completed on the remote side (meaning placed into memory).
You should try to understand the difference between one-sided (WRITE) and two-sided operations (SEND), so that you can see which operation makes the most sense in your case. Generally speaking one-sided operations are faster but harder to use.
Thanks for your correction! I will explain based on your point.
sendBuf.putInt(dataMr.getLkey())
in line 161. Considering rkey and lkey are equal and this is not the main problem, so I ignored the problem.But these all seems are not the main problem causing receiver can't read any data when I change dataWR.getRdma().setRemote_addr(addr)
to dataWR.getRdma().setRemote_addr(addr+1)
.
I originally thought that rkey only had access rights to the first address of the remote data buffer, so when I change the address to addr+1
, I can't access it using rkey and I should access it using the key of addr+1
. But after checking the rdma information, I found that rkey should have access rights to the addresses in the entire area [address, address+len), so this was not the problem.
Then I suspect that the memory space of the data buffer on the remote side is not continuous, and addr+1
does not correspond to dataBuf.position(1)
. But I checked that the dataBuf is allocated through ByteBuffer.allocateDirect()
, and it should be a continuous memory. I checked the continuity of this address through Unsafe
, and it was indeed ok.
So I don't understand what went wrong. I think this is not a problem with my use of WRITE, because addr
can read data, just addr+1
cannot
Ok, it seems you addressed most of my issues raised. When you do increase the remote address do you also decrease the length by 1? Otherwise you might try to write outside the area? It seems you checked this already given your statement above. Do you check each completion queue result if it was successful? If you get an error there that might be easier to debug.
Oh, that's exactly what I missed, the main problem! The length. sgeSend.setLength()
is set to a fixed buffer length in init()
.
Thanks a lot! :)
No Problem. As mentioned above, I would always check the completion queue results to see if the command actually succeeded.
When I use one-sided operations RDMA write, I attempted to perform partial writes on the buffer corresponding to the remote endpoint.
The code on the remote side is as follows:
The code on the local side is as follows:
it's ok, remote side can read "5 0 0" from data buffer, write success. But when I change
dataWR.getRdma().setRemote_addr(addr)
todataWR.getRdma().setRemote_addr(addr+1)
in local side, I thought remote side can read "0 5 0", it didn't, it read "0 0 0", meaning write failed.I want to know what went wrong, why the local cannot write data to the specified offset position of the remote buffer?