luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.84k stars 534 forks source link

Support for Amazon ENA driver #264

Open c4milo opened 7 years ago

c4milo commented 7 years ago

It would be great if netmap had support for this driver, it only ships with DPDK support.

https://aws.amazon.com/blogs/aws/elastic-network-adapter-high-performance-network-interface-for-amazon-ec2/

vmaffione commented 7 years ago

Indeed, it would be great, but we don't have the ENA hardware to play with, nor ENA specifications. In general, netmap support requires ~600 lines of code to be written, which is a very limited amount. We can guide you through the process, if you wish.

username-x commented 7 years ago

Hi! I'm very interested in guidelines for developing netmap-friendly drivers. I have network adapter based on FPGA. It would be so great if you cloud give some tips.

vmaffione commented 7 years ago

Hi, In general you should take one patched driver as a reference, and see what are the required functionalities. My suggestion is to use "e1000" as a reference (code is included in the netmap repository). The code is usually structured like this: (i) a small patch to the standard Linux driver (e.g. e1000_main.c in the linux tree); and (ii) an header file containing the netmap-specific functions (e.g. LINUX/if_e1000_netmap.h).

The steps are the following:

1) Patch the normal driver, see LINUX/final-patches/vanilla--e1000--31200--99999. a) Include the header (ii). b) At the end of the device probe routine, call a netmap-specific function (e.g. e1000_netmap_attach()) which fills in some NIC info (and netmap methods) and calls netmap_attach(). At the beginning of the device remove routine, call netmap_detach(). c) Call netmap_rx_irq where RX interrupt is handled (typically in the NAPI poll routine), preventing the driver to access the RX ring. Similar thing for the TX interupt, where you will call netmap_tx_irq. d) Check where the driver code allocates RX buffers (can be sk_buffs, pages, kmalloc buffers, etc, it depends on the driver) to put their addresses in the RX ring(s). You must prevent that to happen and call a netmap-specific function (e.g. e1000_netmap_init_buffers(), which will instead put the addresses of netmap buffers in the RX rings(s). The netmap-specific function is defined in the header.

2) Implement the nm_register method (e.g. e1000_netmap_reg). The general scheme is

       // put the interface down (if it was up)
        if (onoff) {
                nm_set_native_flags(na);
        } else {
                nm_clear_native_flags(na);
        }
        // put the interface up again (if it was up)

3) Implement the txsync method (e.g. e1000_netmap_txsync). The purpose of this method is to see what changed in the netmap ring (an abstract device-independent ring of buffers) w.r.t the last time txsync was called, and reflect those changes to the real NIC TX ring. Also, see what changed in the NIC TX ring and reflect it back to netmap ring. Start copying the body of e1000_netmap_txsync. a) You need to change the body of the loop to fill-int the NIC-specific TX slots using the information stored in the corresponding "netmap_slot", which are abstract and device-independent. Each netmap_slot corresponds to a packet to be transmitted. b) After the loop, notify the NIC about the new packets in the TX ring. This typically happens writing to a NIC register (e.g. the TDT in case of e1000). c) update kring->nr_hwtail to reflect the index of the next TX packet still to be processed by the hardware (e.g. the one after the last processed). In case of e1000, this information is stored in the TDH register.

4) Implement the rxsync method (e.g. e1000_netmap_rxsync). The purpose of this method is again to reflect the changes in the netmap ring to the NIC RX ring and the other way around. Start copying the body of e1000_netmap_rxsync. a) In the first loop you need to scan those slots in the RX ring that have been used by the NIC to receive a packet (i.e.. new packets received). You need to fill each netmap_slot using the information stored in the corresponding slot in the NIC RX ring. b) In the second loop you need to clean those NIC RX slots that have been "used" by the netmap application (e.g. userspace process) and give them back to the NIC to be reused for new receive operations. You typically need to write to a NIC register to notify the NIC that new RX slots are available (e.g. RDT register in e1000). If netmap buffers changed (e.g. because of zerocopy swap), you also need to update the address in the NIC RX slot (see the NS_BUF_CHANGED flag)

5) Implement the initialization function which links netmap buffers to RX rings (e.g. e1000_netmap_init_buffers). Basically you need to scan the rings and fill in the NIC RX slots using the addess/len info contained in the netmap RX slots. The same is not usually necessary for TX rings.

My suggestion is to start with (1) (ignoring 1.d), then go for (2) and (3). At this point you should be able to test transmission. Then you can go ahead with (1.d), (4) and (5). In any case at least provide stubs for (4), (5) and (1.d)

2016-12-28 18:45 GMT+01:00 Ivanov Anton notifications@github.com:

Hi! I'm very interested in guidelines for developing netmap-friendly drivers. I have network adapter based on FPGA. It would so great if you cloud give some tips.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/luigirizzo/netmap/issues/264#issuecomment-269512583, or mute the thread https://github.com/notifications/unsubscribe-auth/AEsSwezqBS5vNqvrTuZphkyZmpp4Grqlks5rMqA3gaJpZM4LRhrK .

-- Vincenzo Maffione

gspivey commented 7 years ago

@vmaffione Amazon has been posting the source for the ena drivers to github. https://github.com/amzn/amzn-drivers

DPDK seems to already support ena. http://dpdk.org/doc/guides/nics/ena.html

I am willing to give this a shot if you don't mind giving feedback.

vmaffione commented 7 years ago

Hi Gerard, I've taken a quick look to the ena linux driver, it seems to follow the usual Intel NIC driver structure (see e1000, ixgbe, igb, etc.). This is not surprising as people willing to write a new Linux NIC driver take inspiration from Intel drivers that are well written and stable.

Anyway I see the datapath operation is sketched quite clearly here https://github.com/amzn/amzn-drivers/blob/master/kernel/linux/ena/README

Netmap support in principle requires way less effort than DPDK support, as only a small datapath-related patch is needed. The original driver is still in charge of everything related to configuration and control path (apart from management of RX buffer refill, see above).

If you want to try sketching a netmap patch for ena we can give you feedback and suggestions for sure. You can follow the guidelines above.

gspivey commented 7 years ago

Hello @vmaffione,

Thanks for your support. I have read through your guidelines, netmap source, and ena source in more detail. I will attempt to implement this in five phases (you list two phase 4's) as you described in your guidelines.

I will post a gist with a prototype of steps 1a-1c. It will include a function ena_netmap_attach And three patches of ena_netdev.c for:

  1. ena_netmap_attach
  2. netmap_detach
  3. netmap_rx_irq

Questions: Is netmap_init_buffers optional (Step5)? i40e_netmap_linux.h does not seem to implement it.

vmaffione commented 7 years ago

Hi Gerard, My bad on the two phase 4's, I've fixed the guidelines.

Your plan looks great to me.

Regarding the netmap_init_buffer(), it is not optional (nothing it is optional in the list). In i40e, this functionality is implemented by i40e_netmap_configure_rx_ring(). The function names are just chosen to be consistent with the original driver. This step is necessary to let the NIC DMA received frames into netmap memory rather than memory attached to sk_buffs.

gspivey commented 7 years ago

Hey @vmaffione Mind taking a look at my notes below? If they look good I will move on to phase 2.

Thanks again for your support.

Gerard

ENA Netmap Prototype Phase 1

Steps for phase one are as follows:

ena_netmap_linux.h prototype

#include <bsd_glue.h>
#include <net/netmap.h>
#include <netmap/netmap_kern.h>

#ifdef NETMAP_LINUX_ENA_PTR_ARRAY
#define NM_ENA_TX_RING(a, r)           ((a)->tx_rings[(r)])
#define NM_ENA_RX_RING(a, r)           ((a)->rx_rings[(r)])
#else
#define NM_ENA_TX_RING(a, r)           (&(a)->tx_rings[(r)])
#define NM_ENA_RX_RING(a, r)           (&(a)->rx_rings[(r)])
#endif
/*
 * The attach routine, called near the end of ena_probe(),
 * fills the parameters for netmap_attach() and calls it.
 * It cannot fail, in the worst case (such as no memory)
 * netmap mode will be disabled and the driver will only
 * operate in standard mode.
 */
static void
ena_netmap_attach(struct ena_adapter *adapter)
{
        struct netmap_adapter na;

        bzero(&na, sizeof(na));

        na.ifp = adapter->netdev;
        na.na_flags = NAF_BDG_MAYSLEEP;
        na.pdev = &adapter->pdev->dev;
        // XXX check that queues is set.
        na.num_tx_desc = NM_ENA_TX_RING(adapter, 0)->count;
        na.num_rx_desc = NM_ENA_RX_RING(adapter, 0)->count;
        // na.nm_txsync = ena_netmap_txsync; // Task 3
        // na.nm_rxsync = ena_netmap_rxsync; // Task 4
        // na.nm_register = ena_netmap_reg; // Task 2
        na.num_tx_rings = na.num_rx_rings = adapter->num_queue_pairs;

        netmap_attach(&na);
}

ena_netmap.c patch

Patch is to 1.1.3 tag of ena_netdev.c

diff --git a/ena_netdev.c b/ena_netdev.c
index 0facf46..445857b 100644
--- a/ena_netdev.c
+++ b/ena_netdev.c
@@ -54,6 +54,10 @@
 #include "ena_pci_id_tbl.h"
 #include "ena_sysfs.h"

+#if defined(CONFIG_NETMAP) || defined(CONFIG_NETMAP_MODULE)
+#include <ena_netmap.h>
+#endif
+
 static char version[] = DEVICE_NAME " v" DRV_MODULE_VERSION "\n";

 MODULE_AUTHOR("Amazon.com, Inc. or its affiliates");
@@ -696,6 +700,13 @@ static int ena_clean_tx_irq(struct ena_ring *tx_ring, u32 budget)
        int tx_pkts = 0;
        int rc;

+       struct ena_adapter *adapter = tx_ring->adapter;
+       struct net_device *netdev = adapter->netdev;
+#ifdef DEV_NETMAP
+        if (netmap_tx_irq(netdev, 0))
+                return true; /* cleaned ok */
+#endif /* DEV_NETMAP */
+
        next_to_clean = tx_ring->next_to_clean;
        txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->qid);

@@ -1013,6 +1024,18 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
        int total_len = 0;
        int rx_copybreak_pkt = 0;

+        struct net_device *netdev = adapter->netdev;
+#ifdef DEV_NETMAP
+#ifdef CONFIG_ENA_NAPI
+#define NETMAP_DUMMY work_done
+#else
+        int dummy;
+#define NETMAP_DUMMY &dummy
+#endif
+        if (netmap_rx_irq(netdev, 0, NETMAP_DUMMY))
+                return true;
+#endif /* DEV_NETMAP */
+
        netif_dbg(rx_ring->adapter, rx_status, rx_ring->netdev,
                  "%s qid %d\n", __func__, rx_ring->qid);
        res_budget = budget;
@@ -3377,6 +3400,10 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
        adapter->timer_service.function = ena_timer_service;
        adapter->timer_service.data = (unsigned long)adapter;

+#ifdef DEV_NETMAP
+        ena_netmap_attach(adapter);
+#endif /* DEV_NETMAP */
+
        add_timer(&adapter->timer_service);

        dev_info(&pdev->dev, "%s found at mem %lx, mac addr %pM Queues %d\n",
@@ -3517,6 +3544,11 @@ static void ena_remove(struct pci_dev *pdev)
        ena_com_destroy_interrupt_moderation(ena_dev);

        vfree(ena_dev);
+
+#ifdef DEV_NETMAP
+        netmap_detach(netdev);
+#endif /* DEV_NETMAP */
+
 }

 static struct pci_driver ena_pci_driver = {
vmaffione commented 7 years ago

Hi @gspivey . It looks quite ok, but I have some comments, mostly related to the fact that you should have taken ixgbe as a reference more than i40e (there are some small things to be fixed in i40e, while ixgbe support is more mature).