jonlanglet / DTA

This is the repository for Direct Telemetry Access, a high-speed network telemetry collection system.
MIT License
18 stars 1 forks source link

Problem about RDMA Fetch&Add operation #1

Closed Eternity-Wang closed 1 month ago

Eternity-Wang commented 5 months ago

I have so far been able to successfully send RDMA Write-Only packets via the send_rdma_synthetic.py file and write the key-value data into the memory address corresponding to the Collector. However, when I use scapy to construct a packet that mimics the RDMA Fetch&Add operation, I find that the packet does not add up the values of the memory addresses as expected. I have disable the icrc validation in RDMA NIC, but it still cannot fetch and add the value of the remote memory address in Collector. I would like to know how to solve this problem?

Here is the code for send_rdma_synthetic.py

#!/usr/bin/env python3

#from scapy.all import send, IP, ICMP
from scapy.all import *
#from scapy.contrib import roce
import random
import sys
import struct
import time
import random
import argparse
import ipaddress

srcMAC = "05:05:05:05:05:05" 
dstMAC = "06:06:06:06:06:06"

srcIP = "192.168.4.3"
dstIP = "192.168.3.3"

rocev2_port = 4791
rdma_dir = "/root/experiment/rdma/rdma_metadata"

# 12 Bytes
class BTH(Packet):
    name = "BTH"
    fields_desc = [
        ByteField("opcode", 0),
        BitField("solicitedEvent", 0, 1),
        BitField("migReq", 1, 1),
        BitField("padCount", 0, 2),
        BitField("transportHeaderVersion", 0, 4),
        XShortField("partitionKey", 0),
        XByteField("reserved1", 0),
        ThreeBytesField("destinationQP", 0),
        BitField("ackRequest", 0, 1),
        BitField("reserved2", 0, 7),
        ThreeBytesField("packetSequenceNumber", 0)
    ]

# 16 Bytes
class RETH(Packet):
    name = "RETH"
    fields_desc = [
        BitField("virtualAddress", 0, 64),
        IntField("rKey", 0),
        IntField("dmaLength", 0)
    ]

# 28 Bytes
class AtomicETH(Packet):
    name = "AtomicETH"
    fields_desc = [
        BitField("virtualAddress", 0, 64),
        IntField("rKey", 0),
        BitField("addData", 0, 64),
        BitField("cmpData", 0, 64)
    ]

class iCRC(Packet):
    name = "iCRC"
    fields_desc = [
        IntField("iCRC", 0)
    ]

def makeRocev2Write(psn, dstQP, vAddr, rKey):

    data = b'\x09\x09\x09\x11\x11\x11\x11\x11'

    pkt = Ether(src=srcMAC,dst=dstMAC)
    pkt = pkt/IP(src=srcIP,dst=dstIP,ihl=5,flags=0b010,proto=0x11,id=300,tos=2)
    pkt = pkt/UDP(sport=10000,dport=rocev2_port,chksum=0)
    pkt = pkt/BTH(opcode=0b01010,partitionKey=0xffff,destinationQP=dstQP, packetSequenceNumber=psn)
    pkt = pkt/RETH(dmaLength=8,virtualAddress=vAddr+16,rKey=rKey)
    pkt = pkt/Raw(data)

    pkt["IP"].len = 68
    pkt["UDP"].len = 48

    pkt = pkt/iCRC(iCRC=0)

    pkt.show2()
    return pkt

# Make RDMA-Fetch&Add packet
def makeRocev2Fetch_Add(psn, dstQP, vAddr, rKey):

    pkt = Ether(src=srcMAC, dst=dstMAC)
    pkt = pkt/IP(src=srcIP, dst=dstIP, ihl=5, flags=0b010, proto=0x11, id=300, tos=2)
    pkt = pkt/UDP(sport=10000, dport=rocev2_port, chksum=0)
    # Fetch&Add operation code is 10100
    pkt = pkt/BTH(opcode=0b10100,partitionKey=0xffff,destinationQP=dstQP, packetSequenceNumber=psn)
    pkt = pkt/AtomicETH(virtualAddress=vAddr, rKey=rKey, addData=1, cmpData=0)

    pkt["IP"].len = 72
    pkt["UDP"].len = 52

    pkt = pkt/iCRC(iCRC=0)
    print(pkt)

    pkt.show2()

    return pkt

def getRDMAMetadata():
    global rdma_dir

    f = open("%s/tmp_qpnum" % rdma_dir, "r")
    queue_pair = int(f.read())
    f.close()

    f = open("%s/tmp_psn" % rdma_dir, "r")
    start_psn = int(f.read())
    f.close()

    f = open("%s/tmp_memaddr" % rdma_dir, "r")
    memory_start = int(f.read())
    f.close()

    f = open("%s/tmp_memlen" % rdma_dir, "r")
    memory_length = int(f.read())
    f.close()

    f = open("%s/tmp_rkey" % rdma_dir, "r")
    remote_key = int(f.read())
    f.close()

    print("Collector RDMA metadata read from disk!")

    return queue_pair, start_psn, memory_start, memory_length, remote_key

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--op", type=int, default=1, help="Command to execute: 1 for Write, 2 for Fetch_Add. Default: 1")
    args = parser.parse_args()
    dstQP, psn, vAddr, _, rKey = getRDMAMetadata()
    print("Destination Queue Pair:", dstQP)
    print("Packet Sequence Number:", psn)
    print("Virtual Memory Address:", vAddr)
    print("Remote Key:", rKey)
    if args.op == 1:
        pkt = makeRocev2Write(psn=psn, dstQP=dstQP, vAddr=vAddr, rKey=rKey)
        print("Sending packet", pkt)
        sendp(pkt, iface="enp2s0f0")
        wrpcap("rocev2_write_pkt.pcap",pkt)
    elif args.op == 2:
        print("Prepare to send Fetch&Add RDMA packet")
        pkt = makeRocev2Fetch_Add(psn=psn, dstQP=dstQP, vAddr=vAddr, rKey=rKey)
        print("Sending packet", pkt)
        sendp(pkt, iface="enp2s0f0")
        wrpcap("rocev2_fetch_add_pkt.pcap",pkt)

This is the NAK packet send from Collector and captured by wireshark: Sever

jonlanglet commented 3 months ago

Scapy should not be used to generate the F&A packets. This should be done in the ASIC at the translator. What you should do is generate DTA reports, and have the translator intercept and convert these into suitable RDMA operations.

Assuming this is what is being done: Did you update the packet lengths in the IP and UDP headers accordingly when generating AETH instead of RETH? (check control block ControlCraftRDMA)

If that is not it, I would dump the translator-generated F&A packet and a valid F&A (between two normal rNICs), compare their packet layouts, and investigate the differences.

In the future (you and anyone else), contact me via email for a faster response :) Good luck!

Eternity-Wang commented 2 months ago

Scapy should not be used to generate the F&A packets. This should be done in the ASIC at the translator. What you should do is generate DTA reports, and have the translator intercept and convert these into suitable RDMA operations.

Assuming this is what is being done: Did you update the packet lengths in the IP and UDP headers accordingly when generating AETH instead of RETH? (check control block ControlCraftRDMA)

If that is not it, I would dump the translator-generated F&A packet and a valid F&A (between two normal rNICs), compare their packet layouts, and investigate the differences.

In the future (you and anyone else), contact me via email for a faster response :) Good luck!

Thank you very much for your detailed reply. I have currently implemented the F&A operation using another RDMA program with a GRH header. The reason for the previous code error, I guess it could be related to the RDMA NIC we used.

jonlanglet commented 1 month ago

Perfect! Glad it worked out.