mdlayher / raw

Package raw enables reading and writing data at the device driver level for a network interface. MIT Licensed.
MIT License
425 stars 71 forks source link

raw: potential regression after Go 1.12 runtime network poller integration #42

Closed mdlayher closed 5 years ago

mdlayher commented 5 years ago

From a conversation in https://github.com/mdlayher/raw/pull/40.

This commit broke router7’s dhcp4 client: rtr7/router7#25

After a git revert f434146a17ecdf185e28b9ae8457ef45ee82077a, everything works as expected. At HEAD, dhcp4 gets one reply, then fails to ever send packets on the network interface, resulting in the > DHCP lease expiring.

Haven’t investigated it any further, but perhaps this is sufficient to spot the mistake?

/cc @hugelgupf @stapelberg

mdlayher commented 5 years ago

Ping, any more word on this?

hugelgupf commented 5 years ago

I've got to test the renewals, I haven't got the chance yet.

mdlayher commented 5 years ago

No worries, just wanted to check in.

stapelberg commented 5 years ago

So I just took a few minutes to reproduce this. I tested in the following scenarios:

  1. stracing dhcp4 when booting router7. Because the link is not fully ready yet, the first DHCP request always fails. Because of the bug we’re seeing, subsequent requests are never sent.
  2. stracing dhcp4 when interactively logging into router7 (after boot). The first request works, but when the time for renewal comes, no packets are sent. I can’t even see a sendto syscall in strace! My guess would be that the runtime doesn’t think it can send packets to the interface?

I have emailed strace logs of all 4 scenarios to @hugelgupf (working/broken, boot/interactive), where the only difference between working and broken is whether commit f434146a17ecdf185e28b9ae8457ef45ee82077a is reverted/present.

isi-lincoln commented 5 years ago

Apologies if this is not related, but I cannot compile the code any longer. The code in question is related to https://github.com/mdlayher/raw/commit/f434146a17ecdf185e28b9ae8457ef45ee82077a

go version go1.11.5 linux/amd64

../../vendor/github.com/mdlayher/raw/raw_linux.go:89:14: f.SyscallConn undefined (type *os.File has no field or method SyscallConn)
stapelberg commented 5 years ago

@isi-lincoln That’s unrelated. You need Go 1.12, as documented in the README file.

stapelberg commented 5 years ago

Finally found a way to reliably reproduce the issue:

% docker run --privileged -t -i debian:sid
# apt update && apt install -y golang-1.12-go git dnsmasq
# /usr/lib/go-1.12/bin/go get -t github.com/rtr7/router7/integration/dhcpv4
# /usr/lib/go-1.12/bin/go test -v github.com/rtr7/router7/integration/dhcpv4
# cat > /tmp/patch <<'EOT'
--- a/integration/dhcpv4/dhcpv4_test.go
+++ b/integration/dhcpv4/dhcpv4_test.go
@@ -114,6 +114,7 @@ func TestDHCPv4(t *testing.T) {
                if err := c.Err(); err != nil {
                        t.Fatal(err)
                }
+               time.Sleep(10 * time.Second)
        }

        // Renew once more, but with a new client object (simulating a dhcp4 process
EOT
# cd ~/go/src/github.com/rtr7/router7
# patch -l -p1 < /tmp/patch
# /usr/lib/go-1.12/bin/go test -v github.com/rtr7/router7/integration/dhcpv4
stapelberg commented 5 years ago

…and here is a more stand-alone test program which reproduces the problem:

package main

import (
    "log"
    "net"
    "syscall"
    "time"

    "github.com/google/gopacket"
    "github.com/google/gopacket/layers"
    "github.com/mdlayher/raw"
)

func pkt() []byte {
    discover := &layers.DHCPv4{
        Operation:    layers.DHCPOpRequest,
        HardwareType: layers.LinkTypeEthernet,
        HardwareLen:  uint8(len(layers.EthernetBroadcast)),
    }
    buf := gopacket.NewSerializeBuffer()

    ip := &layers.IPv4{
        Version:  4,
        TTL:      255,
        Protocol: layers.IPProtocolUDP,
        SrcIP:    net.ParseIP("0.0.0.0"),
        DstIP:    net.ParseIP("255.255.255.255"),
    }

    udp := &layers.UDP{
        SrcPort: 68,
        DstPort: 67,
    }
    udp.SetNetworkLayerForChecksum(ip)

    gopacket.SerializeLayers(buf,
        gopacket.SerializeOptions{
            FixLengths:       true,
            ComputeChecksums: true,
        },
        ip,
        udp,
        discover,
    )
    return buf.Bytes()
}

func discover(conn net.PacketConn) (*layers.DHCPv4, error) {
    broadcast := &raw.Addr{HardwareAddr: layers.EthernetBroadcast}
    _, err := conn.WriteTo(pkt(), broadcast)
    if err != nil {
        log.Printf("conn.WriteTo: %v", err)
        return nil, err
    }
    conn.SetDeadline(time.Now().Add(5 * time.Second))
    return nil, nil
}

func logic() error {
    eth0, err := net.InterfaceByName("lo")
    if err != nil {
        log.Fatal(err)
    }

    conn, err := raw.ListenPacket(eth0, syscall.ETH_P_IP, &raw.Config{
        LinuxSockDGRAM: true,
    })
    if err != nil {
        return err
    }
    _, err = discover(conn)
    if err != nil {
        log.Fatal(err)
    }
    time.Sleep(5 * time.Second)
    _, err = discover(conn)
    if err != nil {
        log.Fatal(err)
    }

    return nil
}

func main() {
    if err := logic(); err != nil {
        log.Fatal(err)
    }
}

Notably, if I remove the SetDeadline call, the code works.

stapelberg commented 5 years ago

Now that I see it written like that, I think I know what’s going on: I’m using SetDeadline where I should be using SetReadDeadline. Most likely, raw previously had a bug where the deadline was not respected, and that’s now fixed, exposing a bug in my code.

stapelberg commented 5 years ago

Deployed and confirmed working. Sorry for the noise—this was on my end after all :)

mdlayher commented 5 years ago

No problem, and thanks for the investigative work!