icexin / eggos

A Go unikernel running on x86 bare metal
http://eggos.icexin.com
MIT License
2.24k stars 113 forks source link

nil pointer in e1000 driver on AWS EC2 #81

Open aidansteele opened 3 years ago

aidansteele commented 3 years ago

Hi,

This is a very cool project - thank you for building it! I tried running the helloworld example on AWS EC2 and got the following error. This output is running with the "debug" log level.

[trap] tid:2
[syscall] tid:4
[pci] no pci device found for e1000
[inet] begin dhcp
panic: nil pointer or invalid memory access
goroutine 1 [running]:
github.com/icexin/eggos/kernel.pageFaultPanic()
    /Users/aidan/dev/oss/eggos/kernel/trap.go:73 +0x2a
github.com/icexin/eggos/drivers/e1000.(*driver).Transmit(0x68056140, 0x6808c200, 0x3a29de, 0x6)
    /Users/aidan/dev/oss/eggos/drivers/e1000/e1000.go:224 +0x3b
github.com/icexin/eggos/inet.(*endpoint).WritePacket(0x680561e0, 0x68024cdc, 0x4, 0x3ff958, 0x4, 0x680249e8, 0x6, 0x0, 0x0, 0x300000800, ...)
    /Users/aidan/dev/oss/eggos/inet/endpoint.go:89 +0x67
gvisor.dev/gvisor/pkg/tcpip/link/nested.(*Endpoint).WritePacket(...)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/link/nested/nested.go:117
gvisor.dev/gvisor/pkg/tcpip/link/ethernet.(*Endpoint).WritePacket(0x68056230, 0x68024cdc, 0x4, 0x3ff958, 0x4, 0x680249e8, 0x6, 0x0, 0x0, 0x300000800, ...)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/link/ethernet/ethernet.go:66 +0x12d
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).writePacket(0x680ee000, 0x68024cdc, 0x4, 0x3ff958, 0x4, 0x680249e8, 0x6, 0x0, 0x0, 0x300000800, ...)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/stack/nic.go:368 +0xfe
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).writePacketBuffer(0x680ee000, 0x68024cdc, 0x4, 0x3ff958, 0x4, 0x680249e8, 0x6, 0x0, 0x0, 0x300000800, ...)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/stack/nic.go:314 +0xd0
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).enqueuePacketBuffer(0x680ee000, 0x6808a960, 0x800, 0x404800, 0x6808c200, 0x6805822a, 0x8, 0x8)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/stack/nic.go:329 +0x25d
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).WritePacket(0x680ee000, 0x6808a960, 0x800, 0x6808c200, 0x1c, 0x0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/stack/nic.go:307 +0x53
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).writePacket(0x6806ca00, 0x6808a960, 0x6808c200, 0x6808a900, 0x0, 0x0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/network/ipv4/ipv4.go:495 +0x375
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).WritePacket(0x6806ca00, 0x6808a960, 0x4000000011, 0x6808c200, 0x0, 0x0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/network/ipv4/ipv4.go:445 +0x179
gvisor.dev/gvisor/pkg/tcpip/stack.(*Route).WritePacket(0x6808a960, 0x4000000011, 0x6808c200, 0x680b0458, 0xd2d0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/stack/route.go:462 +0xad
gvisor.dev/gvisor/pkg/tcpip/transport/udp.(*udpPacketInfo).send(0x6845e930, 0x405290, 0x680639e0, 0x68022340)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/transport/udp/endpoint.go:874 +0x3f9
gvisor.dev/gvisor/pkg/tcpip/transport/udp.(*endpoint).write(0x680a4f00, 0x405290, 0x680639e0, 0x68022340, 0x0, 0x35f520, 0x1, 0x680639e0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/transport/udp/endpoint.go:581 +0x176
gvisor.dev/gvisor/pkg/tcpip/transport/udp.(*endpoint).Write(0x680a4f00, 0x405290, 0x680639e0, 0x68022340, 0x0, 0x36c720, 0x3381e0, 0x680562d0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/transport/udp/endpoint.go:431 +0x68
gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.(*UDPConn).WriteTo(0x680562d0, 0x680cc100, 0xfa, 0xfa, 0x406f28, 0x68063860, 0x0, 0x0, 0x0)
    /Users/aidan/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20210716193733-566c23a60eea/pkg/tcpip/adapters/gonet/gonet.go:651 +0x1eb
github.com/icexin/eggos/inet/dhcp.(*Client).Request(0x680ec090, 0x409328, 0x6804c660, 0x0, 0x0, 0x0, 0x0)
    /Users/aidan/dev/oss/eggos/inet/dhcp/client.go:150 +0x5e5
github.com/icexin/eggos/inet.dodhcp(0x680249e8, 0x6, 0x6, 0x68056230)
    /Users/aidan/dev/oss/eggos/inet/stack.go:87 +0x147
github.com/icexin/eggos/inet.Init()
    /Users/aidan/dev/oss/eggos/inet/stack.go:50 +0x234
github.com/icexin/eggos.kernelInit()
    /Users/aidan/dev/oss/eggos/eggos.go:34 +0x1dc
github.com/icexin/eggos.init.0()
    /Users/aidan/dev/oss/eggos/eggos.go:38 +0x25

That nil pointer panic is from this code:

https://github.com/icexin/eggos/blob/971efada741e1d984cd284d591f15adaba03c77b/drivers/e1000/e1000.go#L223-L224

A successful run of the same kernel in qemu has different logs:

[trap] tid:2
[video] can't found video info from bootloader, video disabled
[syscall] tid:4
[pci] found 8086:100e for e1000, irq:43

[e1000] enable bus master
[e1000] mmap for bar0 0xfebc0000
[e1000] begin reset
[e1000] reset done
[e1000] link up
[e1000] begin read mac
[e1000] mac:525400123456
[inet] begin dhcp
[dhcp] offer done
[dhcp] offer ip:10.0.2.15 server:10.0.2.2
[dhcp] lease:24h0m0s
[inet] dhcp done
[inet] addr:10.0.2.15
[inet] gateway:10.0.2.2
[inet] mask:255.255.255.0
[inet] dns:10.0.2.3
hello eggos
[syscall] write(1)(0x1, 0x681a4020, 0xc, 0x20, 0xc, 0x4) = 12

I understand that this is unlikely to be enough detail for you to diagnose the problem. I am happy to try assist, but I might need some directions from you. Let me know what details you need and I will try provide them.

aidansteele commented 3 years ago

Some more details: I logged the PCI devices discovered in both qemu and ec2.

PCI devices in qemu ``` [pci] found devices: ([]*pci.Device) (len=6 cap=8) { (*pci.Device)(0x68024560)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 4663 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 0, Func: (uint8) 0 }, Class: (uint8) 6, SubClass: (uint8) 0, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024570)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 28672 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 1, Func: (uint8) 0 }, Class: (uint8) 6, SubClass: (uint8) 1, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024580)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 28688 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 1, Func: (uint8) 1 }, Class: (uint8) 1, SubClass: (uint8) 1, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024590)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 28947 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 1, Func: (uint8) 3 }, Class: (uint8) 6, SubClass: (uint8) 128, IRQLine: (uint8) 9, IRQNO: (uint8) 41 }), (*pci.Device)(0x680245a0)({ Ident: (pci.Identity) { Vendor: (uint16) 4660, Device: (uint16) 4369 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 2, Func: (uint8) 0 }, Class: (uint8) 3, SubClass: (uint8) 0, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x680245b0)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 4110 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 3, Func: (uint8) 0 }, Class: (uint8) 2, SubClass: (uint8) 0, IRQLine: (uint8) 11, IRQNO: (uint8) 43 }) } ```
PCI devices in EC2 ``` [pci] found devices: ([]*pci.Device) (len=6 cap=8) { (*pci.Device)(0x68024554)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 4663 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 0, Func: (uint8) 0 }, Class: (uint8) 6, SubClass: (uint8) 0, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024560)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 28672 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 1, Func: (uint8) 0 }, Class: (uint8) 6, SubClass: (uint8) 1, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024570)({ Ident: (pci.Identity) { Vendor: (uint16) 32902, Device: (uint16) 28947 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 1, Func: (uint8) 3 }, Class: (uint8) 0, SubClass: (uint8) 0, IRQLine: (uint8) 9, IRQNO: (uint8) 41 }), (*pci.Device)(0x68024580)({ Ident: (pci.Identity) { Vendor: (uint16) 7439, Device: (uint16) 4369 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 3, Func: (uint8) 0 }, Class: (uint8) 3, SubClass: (uint8) 0, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }), (*pci.Device)(0x68024590)({ Ident: (pci.Identity) { Vendor: (uint16) 7439, Device: (uint16) 32865 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 4, Func: (uint8) 0 }, Class: (uint8) 1, SubClass: (uint8) 8, IRQLine: (uint8) 11, IRQNO: (uint8) 43 }), (*pci.Device)(0x680245a0)({ Ident: (pci.Identity) { Vendor: (uint16) 7439, Device: (uint16) 60448 }, Addr: (pci.Address) { Bus: (uint8) 0, Device: (uint8) 5, Func: (uint8) 0 }, Class: (uint8) 2, SubClass: (uint8) 0, IRQLine: (uint8) 0, IRQNO: (uint8) 32 }) } ```
icexin commented 3 years ago

Thank you very much for your report.

The network card driver model of eggos is intel's e1000 series. From the PCI list you provided, the network card used by ec2 is the one with vendor 7439 and device 32865, which is an exclusive model of amazon, see https://www.pcilookup.com/?ven=1d0f&dev=&action=submit. I will try to support the virtio driver later, so that the network module can be used on the cloud server. I wonder if ec2 supports virtio network card?

I'm just curious about how you made eggos into an ec2 image?

Thanks again for sharing eggos' attempts on the cloud server.

aidansteele commented 3 years ago

Hi @icexin, thanks for your quick feedback! I realised the next day that you are indeed correct, EC2 does not use the e1000 so I understand why it couldn't find it 😄

I did some more research and the network hardware for EC2 is actually quite complex. There are at least three options that I am aware of:

The ENA drivers are open source and seem to be well-documented, but I am not enough of an expert to replicate them in Go: https://github.com/amzn/amzn-drivers/tree/master/kernel/linux/ena. This would be the best perf driver to copy, but I assume it's complex.

The older instances (the ones without ENA) are based on Xen and have the "Xen Platform Device" PCI device attached. It has vendor ID 0x5853 and device ID 0x0001. This appears to be documented here (ctrl+F for xen_platform_pci=1). The network driver on Linux is here.

Regarding eggos -> EC2 image: I can submit a PR sometime in the next few days with a script to do it. But the approximate process is:

Sorry it is not more detailed. It is on my other computer. I will submit a PR soon.