Native stack bug when the num_cores > num_queues

ashaffer commented 5 years ago

I just got started using Seastar so it's entirely possible i'm configuring something incorrectly. However, I spent the last several days tracking down a nasty little issue I was having running the native stack on an EC2 instance. I noticed that I could complete a TCP connection a small percentage of the time, but it seemed completely random. Eventually I tracked it down to these offending lines of code:

In interface::dispatch_packet (net.cc line 328): auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {

and

tcp::connect (tcp.hh line 844):

        src_port = _port_dist(_e);
        id = connid{src_ip, dst_ip, src_port, dst_port};
    } while (_inet._inet.netif()->hw_queues_count() > 1 &&
             (_inet._inet.netif()->hash2cpu(id.hash(_inet._inet.netif()->rss_key())) != engine().cpu_id()
              || _tcbs.find(id) != _tcbs.end()));

As you can see here, CPU selection is being done slightly differently upon opening the connection, vs receiving packets. Inside of hash2cpu the CPU is selected like this:

return forward_dst(hash2qid(hash), [hash] { return hash; });

It passes in hash2qid rather than engine().cpu_id(), like tcp::connect does. What this ends up meaning is that my connection only works if, by chance, these two values happen to match. Which ends up being a small percentage of the time on the instance i'm using.

I know that this is the issue, because if I change the engine().cpu_id() call in dispatch_packet to hash2qid, everything works reliably again. However, I don't think that's going to spread the load over all the cores in the way that I want.

Is this an issue of me misunderstanding some aspect of configuration, or is this a real bug?

avikivity commented 5 years ago

/cc @gleb-cloudius

gleb-cloudius commented 5 years ago

What did you pas to hash2qid() when calling it in dispatch_packet()?

The code in connect does not suppose to use current cpu. On the contrary it tries to figure out which cpu will receive the packet with a given tuple.

Can you describe your NIC HW? How many hw queues it has, does it support HW RSS?

ashaffer commented 5 years ago

The exact modification that I made was changing:

auto fw = _dev->forward_dst(engine().cpu_id(), [&p, &l3, this] () {`

to:

forward_hash data;
l3.forward(data, p, sizeof(eth_hdr));
auto fw = _dev->forward_dst(_dev->hash2qid(hash), [&p, &l3, this] () {

I'm using a C5.9xlarge EC2 instance with an ENA adapter. It has 8 hardware queues, and it does not support RSS (Or at least, Seastar is not using hardware RSS).

gleb-cloudius commented 5 years ago

How can it not support RSS if it has 8 queues? How it balances the traffic between the queues? It looks like NIC and seastar dressage about how particular queue is chosen.

ashaffer commented 5 years ago

How can it not support RSS if it has 8 queues?

It may support RSS, but when I added print statements to:

                auto hwrss = p.rss_hash();
                if (hwrss) {
                    return hwrss.value();
                } else {
                    forward_hash data;
                    if (l3.forward(data, p, sizeof(eth_hdr))) {
                        return toeplitz_hash(rss_key(), data);
                    }

                    return 0u;
                }

The else branch was being taken.

gleb-cloudius commented 5 years ago

This does not mean it does not support RSS, it means it does not provide rss hash value in a packet descriptor, so we need to calculate it by ourselves. The best way to figure out RSS configuration is to instrument dpdk device initialization in src/net/dpdk.cc (set_rss_table, init_port_start)

ashaffer commented 5 years ago

Hmm, well, it seems it does support RSS, per this:

https://github.com/amzn/amzn-drivers/tree/master/kernel/linux/ena

I am also seeing set_rss_table called and:

Port 0: RSS table size is 128

Printed at startup. So yes, I think RSS is both supported by the NIC and being used by Seastar/DPDK.

EDIT: I will note that I also get this message:

Port 0: Changing HW FC settings is not supported

I wonder, is the issue that Seastar/DPDK are having trouble configuring hardware flow control, and that's creating problems for RSS?

avikivity commented 5 years ago

Maybe the ena driver is outdated. You can try with https://groups.google.com/d/msg/seastar-dev/qUN4ig1BWa8/Dvbm8B9GAAAJ (still undergoing review).

ashaffer commented 5 years ago

Ah, good idea, I had tried to upgrade Seastar's DPDK myself earlier when I noticed that the ENA PMD driver was not the latest, but I ran into some troubles. I imagine i'll have more luck with one of your patches :). I'll give that a try.

avikivity commented 5 years ago

It's from @cyb70289, to give credit

gleb-cloudius commented 5 years ago

EDIT: I will note that I also get this message:

Port 0: Changing HW FC settings is not supported

I wonder, is the issue that Seastar/DPDK are having trouble configuring hardware flow control, and that's creating problems for RSS?

Can you provide full dpdk output? I also see that they do support providing rss hash in a packet descriptor in the Linux driver, so may be new driver will help indeed.

ashaffer commented 5 years ago

Here's the dpdk output, using the patched version with v19.05. It's currently not working:

EAL: Detected 36 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
PMD: Placement policy: Low latency
ports number: 1
Port 0: max_rx_queues 8 max_tx_queues 8
Port 0: using 8 queues
Port 0: RSS table size is 128
LRO is off
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
Port 0 init ... done: 
Creating Tx mbuf pool 'dpdk_pktmbuf_pool0_tx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool0_rx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool5_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool2_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool4_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool3_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool6_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool1_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool7_tx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool5_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool7_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool1_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool6_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool4_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool3_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool2_rx' [1024 mbufs] ...
Port 0: Changing HW FC settings is not supported
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
EAL: Error - exiting with code: 1
  Cause: Cannot start port 0
Segmentation fault on shard 31.
Backtrace:
  0x0000000000144da8
  0x00000000000c8b08
  0x00000000000c8ded
  0x00000000000c8e62
  �}`_64-linux-gnu/libpthread.so.0+0x000000000001288f
  0x00000000000cb10c
  0x00000000000d898f
  0x00000000001538c5
  0x00000000000c338c
  0x0000000000128904
  0x0000000000137ba3
  0x00000000000c23cd
  0x00000000002b7b50
  �}`_64-linux-gnu/libpthread.so.0+0x00000000000076da
   `l_completed_messagesth+0x000000000012188e
Segmentation fault

I've tracked down this issue to failing to allocate buffers in ena_populate_rx_queue. Though I don't yet know why that's occurring.

cyb70289 commented 5 years ago

@ashaffer, the main update of Seastar with 19.05 patch is it now uses iommu+vfio for dpdk memory management, uio is not supported. Do you bind your nic to vfio-pci driver?

You mentioned you're running in EC2 instance, not baremetal? I'm not sure if iommu+vfio can work in virtualized environment.

ashaffer commented 5 years ago

@cyb70289 Thanks for pointing that out. I did indeed have trouble getting that working, it turns out you can get it working on EC2, though. Found instructions here:

https://github.com/amzn/amzn-drivers/tree/master/userspace/dpdk

I'm now getting the same output as before, except with VFIO enabled:

EAL: Detected 36 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL:   using IOMMU type 8 (No-IOMMU)
PMD: Placement policy: Low latency
ports number: 1
Port 0: max_rx_queues 8 max_tx_queues 8
Port 0: using 8 queues
Port 0: RSS table size is 128
LRO is off
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
Port 0 init ... done: 
Creating Tx mbuf pool 'dpdk_pktmbuf_pool0_tx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool0_rx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool3_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool2_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool1_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool5_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool4_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool6_tx' [1024 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool7_tx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool3_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool5_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool1_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool7_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool2_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool6_rx' [1024 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool4_rx' [1024 mbufs] ...
Port 0: Changing HW FC settings is not supported
ena_queue_start(): Failed to populate rx ring !
ena_queue_start_all(): failed to start queue 0 type(1)
EAL: Error - exiting with code: 1
  Cause: Cannot start port 0
Segmentation fault on shard 30.
Backtrace:
  0x0000000000144da8
  0x00000000000c8b08
  0x00000000000c8ded
  0x00000000000c8e62
  �}`_64-linux-gnu/libpthread.so.0+0x000000000001288f
  0x00000000000d862b
  0x00000000000d899d
  0x00000000001538c5
  0x00000000000c338c
  0x0000000000128904
  0x0000000000137ba3
  0x00000000000c23cd
  0x00000000002b7b50
  �}`_64-linux-gnu/libpthread.so.0+0x00000000000076da
   `l_completed_messagesth+0x000000000012188e
Segmentation fault

I will note one thing, which is that I am using VFIO in the "unsafe noiommu" mode as recommended by the page I linked. Does your patch rely on it using IOMMU specifically, or should it work with VFIO without that?

cyb70289 commented 5 years ago

@ashaffer, Thanks for the report. I think the error is from memory subsystem, but it's hard to find the root cause from the logs. Are there any kernel error logs(dmesg) when seastar startup fails? Does native dpdk apps (e.g., testpmd) work in your EC2 instance? I will do some tests in virtual machines to see if I can catch anything interesting.

ashaffer commented 5 years ago

Thanks, I appreciate the help.

Dmesg output when running seastar:

[54249.156964] vfio-pci 0000:00:06.0: enabling device (0400 -> 0402)
[54249.382714] vfio-pci 0000:00:06.0: vfio-noiommu device opened by user (seastar_test:11151)
[54250.161493] show_signal_msg: 10 callbacks suppressed
[54250.161496] reactor-25[11178]: segfault at fffffffffffffff0 ip 000055555561f10d sp 00007fffe6bc1310 error 5
[54250.161498] traps: reactor-35[11188] general protection ip:55555561f10d sp:7fffe1bb7310 error:0
[54250.161502] reactor-5[11158]: segfault at 5fd555573df5 ip 000055555561f10d sp 00007ffff0bd5310 error 4
[54250.161504] reactor-3[11156]: segfault at 10 ip 000055555561f10d sp 00007ffff1bd7310 error 4
[54250.161506] traps: reactor-30[11183] general protection ip:55555562c62c sp:7fffe43bbf60 error:0
[54250.161791] reactor-33[11186]: segfault at 18000003788 ip 000055555561f10d sp 00007fffe2bb9310 error 4 in seastar_test[555555554000+66b000]

Testpmd does seem to work, which I guess is a good sign that it is at least possible in principle for VFIO to work on a non-metal EC2 instance.

Also, I traced exactly where the call is failing. It's inside of rte_eth_dev_init, which in turn ultimately calls the ENA (amzn NIC) driver's specific startup function ena_queue_start_all, which calls ena_queue_start, which fails on line 1185, calling ena_populate_rx_queue with (in my case) bufs_num equal to 8191. ena_populate_rx_queue fails and returns zero because rte_mempool_get_bulk on line 1382 returns -1.

That's about as far as i've gotten trying to debug it myself. It's clearly some problem allocating memory, but I don't yet understand why it's failing there.

EDIT: For comparison, when I run testpmd this is all I see in dmesg:

[54308.125539] vfio-pci 0000:00:06.0: enabling device (0400 -> 0402)
[54308.354966] vfio-pci 0000:00:06.0: vfio-noiommu device opened by user (testpmd:11277)

ashaffer commented 5 years ago

I'll add another issue i've noticed. I hadn't been using hugepages, so I tried using them. Ultimately I encountered the same issue as above, but in order to get to that point I had to explicitly specify --memory 2G otherwise it would try to consume more memory than I had allocated for it, no matter how much I allocated for it.

ashaffer commented 5 years ago

Alright guys...after fumbling around in the dark for a while, I have managed to get it working with the latest DPDK on my EC2 instance. Here's what I had to change:

Line 116 of dpdk.cc I changed default_ring_size to 1024. This was necessary because in the ENA driver if you pass a ring_size of 512, it decides to change that to 8192 for you, which creates problems because seastar has not allocated that much space.
Even with the new dpdk, my original issue doesn't seem to be fixed, so I had to reinstate the change of interface::dispatch_packet to use _dev->hash2qid(hash) rather than engine().cpu_id(). However, I do notice that with the new DPDK it seems to now be taking advantage of the "Low Latency Queue" option for the ENA NIC, which is great.

cyb70289 commented 5 years ago

Great to hear latest dpdk works on EC2 ENA nic. Thank you @ashaffer.

gleb-cloudius commented 5 years ago

On Sun, Jun 23, 2019 at 03:45:17PM -0700, Andrew Shaffer wrote:

Even with the new dpdk, my original issue doesn't seem to be fixed, so I had to reinstate the change of interface::dispatch_packet to use _dev->hash2qid(hash) rather than engine().cpu_id(). However, I do notice that with the new DPDK it seems to now be taking advantage of the "Low Latency Queue" option for the ENA NIC, which is great.

This will result in additional cross cpu hop for each packet. In interface::dispatch_packet() _dev->hash2qid(hash) == engine().cpu_id() has to be true. If it is not a HW and seastar disagrees on how RSS works. Either redirection tables are different or hash function used or keys. May be ENA ignores some of our configuration without issuing any errors. Does it work for you if nic queues equal to number of cpus? Can you provide a full dpdk initialization output?

-- Gleb.

ashaffer commented 5 years ago

Does it work for you if nic queues equal to number of cpus?

If I restrict the cpus using --cpuset, it gets more likely to succeed, as you would expect, but the issue remains.

Can you provide a full dpdk initialization output?

How do I do that?

gleb-cloudius commented 5 years ago

On Mon, Jun 24, 2019 at 12:51:52AM -0700, Andrew Shaffer wrote:

Does it work for you if nic queues equal to number of cpus?

If I restrict the cpus using --cpuset, it gets more likely to succeed, as you would expect, but the issue remains.

OK, so this is indeed RSS issue and not seastar internal redirection issue.

Can you provide a full dpdk initialization output?

How do I do that?

The output that the application prints when it starts. You already provided it here, but for a failed boot. Can you show one for successful boot as well.

-- Gleb.

ashaffer commented 5 years ago

Sure:

EAL: Detected 36 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL: PCI device 0000:00:06.0 on NUMA socket -1
EAL:   Invalid NUMA socket, default to 0
EAL:   probe driver: 1d0f:ec20 net_ena
EAL:   using IOMMU type 8 (No-IOMMU)
PMD: Placement policy: Low latency
ports number: 1
Port 0: max_rx_queues 8 max_tx_queues 8
Port 0: using 8 queues
Port 0: RSS table size is 128
LRO is off
RX checksum offload supported
TX ip checksum offload supported
TX TCP&UDP checksum offload supported
Port 0 init ... done: 
Creating Tx mbuf pool 'dpdk_pktmbuf_pool0_tx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool0_rx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool1_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool3_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool7_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool4_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool5_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool6_tx' [2048 mbufs] ...
Creating Tx mbuf pool 'dpdk_pktmbuf_pool2_tx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool1_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool4_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool3_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool2_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool6_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool5_rx' [2048 mbufs] ...
Creating Rx mbuf pool 'dpdk_pktmbuf_pool7_rx' [2048 mbufs] ...
Port 0: Changing HW FC settings is not supported

Checking link status 
Created DPDK device
done
Port 0 Link Up - speed 0 Mbps - full-duplex
DHCP sending discover
DHCP Got offer for 172.31.24.154
DHCP sending request for 172.31.24.154
DHCP Got ack on request
DHCP  ip: 172.31.24.154
DHCP  nm: 255.255.240.0
DHCP  gw: 172.31.16.1

It hangs here if I don't make that hash2qid edit.

gleb-cloudius commented 5 years ago

It looks like the dpdk ena driver does not support changing neither RSS hash function nor RSS key. They set the function by default to CRC2 and ignore the configuration seastar provides (ena_com_fill_hash_function is called only during default config). Looks like a bug to me. They should at least return an error on re-configure attempt.

-- Gleb.

ashaffer commented 5 years ago

I think i've found the culprit:

https://github.com/DPDK/dpdk/blob/e28111ac9864af09e826241a915dfff87a9c00ad/drivers/net/ena/ena_ethdev.c#L627

It seems to be defaulting to a CRC32 hash. I'm not sure how to configure this from the outside, but i'll try recompiling DPDK with this changed to toeplitz and see if that fixes the issue, then we can try to figure out how to set it externally.

EDIT: Looks like we found it at the same time. Ya, I can't figure out how to set it either. I guess i'll either modify my ENA driver or add a CRC32 hash to Seastar. Would you guys be open to adding the CRC32 option to Seastar and switching to it in the case of an ENA driver? If not it's cool i'll just manage my own little fork.

gleb-cloudius commented 5 years ago

We setup hash function here:

https://github.com/scylladb/seastar/blob/master/src/net/dpdk.cc#L1729

and the key here:

https://github.com/scylladb/seastar/blob/master/src/net/dpdk.cc#L1524

The problem is that ena driver does not report an error on neither of those, but just silently ignore the requested configuration. It would be great to add CRC32 support to seastar and fallback to it if the function cannot be changed, but for that ena drive needs to return a proper error.

EDIT: I think it is fair to complain to dpdk developers to either add function changing support or report an error.

ashaffer commented 5 years ago

Just a little update here. I added CRC32 hashing in, but it doesn't seem to match the output of the NIC. I forced the NIC to report to me the RSS hash it's generating, and it's quite odd. The hash is always of the form: 0xABCDABCD. That it to say, the upper and lower word are the same. Because of that, I also tried a CRC16 hash, but so far to no avail. I can't seem to reproduce the RSS hash being generated by the ENA. Do you guys have any idea what's going on here?

ashaffer commented 5 years ago

Alright....after many trials and tribulations...we have a resolution. It turns out that even though the Amazon DPDK driver sets CRC32 hashing without allowing you to configure it...The NIC itself completely ignores all that and uses Toeplitz. It does this weird thing where it mirrors the upper and lower word in the Toeplitz hash, but since we're only using the low-order bits anyway, we don't need to worry about that. The main reason that things weren't working initially is that they don't use the Mellanox RSS key by default, they use this one, which they claim to be used by the majority of NIC vendors:

0x6d5a56da, 0x255b0ec2, 0x4167253d, 0x43a38fb0, 0xd0ca2bcb, 0xae7b30b4, 0x77cb2da3, 0x8030f20c, 0x6a42b73b, 0xbeac01fa

I'm not sure how you guys would want to handle this. Detect which NIC it is and if it's an ENA, switch to this key? Or just make this one the default and hope keys are configurable for others?

avikivity commented 5 years ago

I don't really know. I recommend raising this on the dpdk mailing list and copying the ena maintainers. Ideally it would be resolved in dpdk and we would just upgrade to a fixed version. If dpdk refuses to fix this, we can add a workaround in Seastar, but we should try upstream first.

ashaffer commented 5 years ago

Unfortunately dpdk can't fix it. It's at the hardware/VM level on EC2. I tried going in and modifying the ENA DPDK driver, and that wasn't sufficient, because the NIC simply ignores commands related to RSS. It's not a huge deal to me, i've gone ahead and changed the key in my fork. The AWS guys say that they're going to enable configurable RSS "soon", but they've been saying that for a little while now it seems.

avikivity commented 5 years ago

Can't they change the dpdk API to report that you can't change the hash function and that it's toeplitz?

If not, then we can patch it in Seastar.

zenmurugan commented 2 years ago

Just want to check back on this thread. I am facing the similar issue as @ashaffer mentioned in AWS EC2 ENA.

Below are the steps I followed on AWS EC2 ENA enabled instance:

Cloned Seastar repo - "seastar-20.05-branch" and its corresponding dpdk commit (71db32d) from scylladb/dpdk repo which is based on DPDK 19.05.
Applied the AWS ENA patches related to 19.05.
Updated the default_ring_size to 1024 in seastar branch src/net/dpdk.cc
Executed Seastar ./configure.py with --cflags='-DRTE_LIBRTE_ENA_DEBUG_TX=1 -DRTE_LIBRTE_ENA_DEBUG_TX_FREE=1 -DRTE_LIBRTE_ENA_DEBUG_RX=1 -DRTE_LIBRTE_ENA_COM_DEBUG=1'

With this I am seeing a hang issue when using multiple cpus for the Seastar based app. If I use one cpu for running the app with cpuset=1, then I am not facing the connection problem. But this limits the application to use single cpu.

Do we have any solution for the Seastar + DPDK + ENA combination that works?

Also, what is the current stable Seastar version that we can use? Does "seastar-20.05-branch" seems good?

@avikivity , or @ashaffer or others any thoughts?

avikivity commented 2 years ago

Best is to try master.

scylladb / seastar

Native stack bug when the num_cores > num_queues #654