smoltcp-rs / smoltcp

a smol tcp/ip stack
BSD Zero Clause License
3.79k stars 427 forks source link

Real world tcp read benchmark never finishes. #493

Closed JakkuSakura closed 3 years ago

JakkuSakura commented 3 years ago

This reddit post says smoltcp achieves 2 Gbps throughput in benchmark. However, It is only tested in loopback. I created a real world benchmark tcp write speed tester on another machine. I can nc remote_address 9999 to test my local tcp's read speed then. When I switch to smoltcp on top of RawSocket, It did connect and read something, but very soon it become so slow that it did even not complete my benchmark. Any idea why this goes wrong?

use std::io::Write;
use std::net::{Shutdown, SocketAddr, TcpListener, TcpStream};

fn send_packets(mut ch: TcpStream, addr: SocketAddr) {
    println!("Accepted {}", addr);
    let size = 1024 * 1024 * 8;
    let chuck = vec![0u8; 1024 * 8];
    let begin = std::time::Instant::now();
    for i in 0..size / chuck.len() {
        if i % 5 == 1 {
            let t = (std::time::Instant::now() - begin).as_secs_f64();
            let spd = i as f64 * chuck.len() as f64 / t / 1000.0;
            println!("{} Time spent {} seconds, {} KiB/s", addr, t, spd);
        }
        ch.write(&chuck).unwrap();
    }
    ch.shutdown(Shutdown::Both).unwrap();
    let t = (std::time::Instant::now() - begin).as_secs_f64();
    let spd = size as f64 / t / 1000.0;
    println!("{} Time spent {} seconds, {} KiB/s", addr, t, spd);
}
fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("0.0.0.0:9999")?;
    loop {
        let (ch, addr) = listener.accept()?;
        std::thread::spawn(move || send_packets(ch, addr));
    }
}
use smoltcp::iface::{Interface, InterfaceBuilder, NeighborCache, Routes};
use smoltcp::phy::{Device, Medium, RawSocket};
use smoltcp::socket::{Dhcpv4Event, Dhcpv4Socket, SocketSet, TcpSocket};
use smoltcp::time::Instant;
use smoltcp::wire::{IpCidr, Ipv4Address, Ipv4Cidr, EthernetAddress};
use std::collections::BTreeMap;
use rand::Rng;
use smoltcp::socket::{SocketHandle, TcpSocketBuffer};

fn main() -> anyhow::Result<()> {
    let neighbor_cache = NeighborCache::new(BTreeMap::new());

    let ip_addrs = [IpCidr::new(Ipv4Address::UNSPECIFIED.into(), 0)];
    let routes = Routes::new(BTreeMap::new());
    let device = RawSocket::new("ens6")?;
    let medium = device.capabilities().medium;
    let mut builder = InterfaceBuilder::new(device)
        .ip_addrs(ip_addrs)
        .routes(routes);
    if medium == Medium::Ethernet {
        builder = builder
            // TODO: use your MAC addr
            .ethernet_addr(EthernetAddress::from_bytes(&[1, 2, 3, 4, 5, 6]))
            .neighbor_cache(neighbor_cache);
    }
    let mut iface = builder.finalize();

    let mut sockets = SocketSet::new(vec![]);
    let dhcp_socket = Dhcpv4Socket::new();
    let dhcp_handle = sockets.add(dhcp_socket);

    loop {
        let timestamp = Instant::now();
        if let Err(e) = iface.poll(&mut sockets, timestamp) {
            println!("poll error: {}", e);
        }

        match sockets.get::<Dhcpv4Socket>(dhcp_handle).poll() {
            None => {}
            Some(Dhcpv4Event::Configured(config)) => {

                println!("DHCP config acquired!");
                println!("config {:?}", config);
                println!("IP address:      {}", config.address);
                set_ipv4_addr(&mut iface, config.address);

                if let Some(router) = config.router {
                    println!("Default gateway: {}", router);
                    iface.routes_mut().add_default_ipv4_route(router).unwrap();
                } else {
                    println!("Default gateway: None");
                    iface.routes_mut().remove_default_ipv4_route();
                }

                for (i, s) in config.dns_servers.iter().enumerate() {
                    if let Some(s) = s {
                        println!("DNS server {}:    {}", i, s);
                    }
                }

                break;
            }
            Some(Dhcpv4Event::Deconfigured) => {
                println!("DHCP lost config!");
                set_ipv4_addr(&mut iface, Ipv4Cidr::new(Ipv4Address::UNSPECIFIED, 0));
                iface.routes_mut().remove_default_ipv4_route();
            }
        }
    }
    let mut sets = ManagedSocketSet::new();
    // TODO: use your server ip
    let tcp_handle = sets.add_tcp_socket(Ipv4Address::new(127, 0, 0, 1), 9999);

    loop {
        if let Err(e) = iface.poll(&mut sets.socket_set, Instant::now()) {
            println!("poll error: {}", e);
        }
        let mut tcp = sets.socket_set.get::<TcpSocket>(tcp_handle);
        let mut data = [0u8; 1024 * 8]; 
        if tcp.may_recv() {
            tcp.recv_slice(&mut data)?;
        }
    }
}

pub struct ManagedSocketSet {
    available_local_ports: Vec<u16>,
    // with Pin ?
    pub socket_set: SocketSet<'static>,
}

impl ManagedSocketSet {
    pub fn new() -> Self {
        let mut v = Vec::new();
        for p in (10000..65535).rev() {
            v.push(p);
        }
        let mut r = rand::thread_rng();
        r.shuffle(&mut v);

        ManagedSocketSet {
            available_local_ports: v,
            socket_set: SocketSet::new::<Vec<_>>((0..65535).map(|_| None).collect()),
        }
    }

    fn get_tcp_port(&mut self) -> Option<u16> {
        self.available_local_ports.pop()
    }

    pub fn add_tcp_socket(&mut self, addr: Ipv4Address, port: u16) -> SocketHandle {
        let local_port = self.get_tcp_port().expect("No available tcp ports");
        let tcp_rx_buffer = TcpSocketBuffer::new(vec![0; 1024 * 1024 * 8]);
        let tcp_tx_buffer = TcpSocketBuffer::new(vec![0; 1024 * 1024 * 8]);
        let mut socket = TcpSocket::new(tcp_tx_buffer, tcp_rx_buffer);
        socket
            .connect((addr, port), local_port)
            .expect("Cannot fail");
        let handle = self.socket_set.add(socket);
        println!("localhost:{} -> {}:{} Added", local_port, addr, port);
        handle
    }

    pub fn incr_socket_count(&mut self, handle: SocketHandle) {
        self.socket_set.retain(handle);
    }

    pub fn decr_socket_count(&mut self, handle: SocketHandle) {
        self.socket_set.release(handle);
    }
    pub fn clean(&mut self) {
        self.socket_set.prune();
    }
}

fn set_ipv4_addr<DeviceT>(iface: &mut Interface<'_, DeviceT>, cidr: Ipv4Cidr)
    where
        DeviceT: for<'d> Device<'d>,
{
    iface.update_ip_addrs(|addrs| {
        let dest = addrs.iter_mut().next().unwrap();
        *dest = IpCidr::Ipv4(cidr);
    });
}

benchmark.log The first ip is from my smoltcp machine, which never completes. The second ip is from my local machine, with nc ip port, which completes.

smoltcp.log

spacemeowx2 commented 3 years ago

Looks like your code forgot to call phy_wait. So the poll loop actually spins waiting for RawSocket to have data which is very inefficient.

See example:

https://github.com/smoltcp-rs/smoltcp/blob/e4241510337e095b9d21136c5f58b2eaa1b78479/examples/httpclient.rs#L110

https://github.com/smoltcp-rs/smoltcp/blob/e4241510337e095b9d21136c5f58b2eaa1b78479/src/phy/sys/mod.rs#L26

JakkuSakura commented 3 years ago

Thanks for pointing out. Though I do let it spin on purpose, since performance is my first priority at any cost.

spacemeowx2 commented 3 years ago

I think the data consumer is not quick enough. replace if tcp.may_recv() to while tcp.can_recv() may help.

        let mut data = [0u8; 1024 * 8]; 
        if tcp.may_recv() {
            tcp.recv_slice(&mut data)?;
        }

And I must say write doesn't guarantee all data will be sent. Use write_all instead.

JakkuSakura commented 3 years ago
while tcp.may_recv() {
     tcp.recv_slice(&mut data)?;
}

doesn't help much. In the end smoltcp seems stuck waiting for something, so the bottleneck is not CPU

JakkuSakura commented 3 years ago

I have spotted serveral severe issues.

  1. MTU on localhost is not honored when the remote machine is trying to detect the proper MTU. This is very significant in AWS. My network interface for smoltcp only supports 1500, and that on the remote machine supports 9000. Then the remote machine supposes MTU on path to be 3000 since smoltcp did not send out an icmp message Fragmentation Needed while receiving a 2900 bytes sized packet. It simply gets a checksum error and drops.

  2. While receiving a checksum errored TCP packet, smoltcp did not send out ACK requiring the remote to retransmit, but instead replying on the remote to time out.

pothos commented 3 years ago

Can you explain the second point a bit more?

Besides the issues you mentioned, these two points from the TCP README section are relevant when you have stuck connections:

However, I guess most remote machines don't have PLMTU activated either and this won't help here. Your remote machine probably needs some other trick to know the right path MTU if smoltcp is not to blame for not sending an ICMP out for this large packets.

JakkuSakura commented 3 years ago

I'm running tracepath on the remote machine.

 tracepath some.ip.v4.address                                                                   

 1?: [LOCALHOST]                      pmtu 9001
 1:  no reply
 2:  no reply
 3:  no reply
 4:  no reply
 5:  no reply
 ...

And I'm only getting the following repeatedly.

EthernetII src=06-db-18-9e-7d-66 dst=06-67-ae-9e-c3-0a type=IPv4
\ (truncated packet)    
Jun 16 09:22:01.745 TRACE main smoltcp_raw_tcp_perf: localhost <- EthernetII src=06-db-18-9e-7d-66 dst=06-67-ae-9e-c3-0a type=IPv4
   \ (truncated packet)

I suppose the remote linux machine are trying to detect the smoltcp side's MTU but couldn't get one. So it is trying to send extra large frames based on its MTU 9001.

Packetization Layer Path MTU Discovery PLPMTU is not implemented

I don't need smoltcp's PLPMTU, but I need smoltcp to response to such extra large ethernet frames with ICMP message.

JakkuSakura commented 3 years ago

Your remote machine probably needs some other trick to know the right path MTU if smoltcp is not to blame for not sending an ICMP out for this large packets.

Indeed, my network interface has trouble supporting larger frames. But the remote machine may not be something I can control, so I need to send out an ICMP message in smoltcp.

JakkuSakura commented 3 years ago

The issue has been solved by #497. Thank you all!