vivier / qemu-m68k

Other
40 stars 6 forks source link

Guest memory corrupted by incoming network packets #46

Open fthain opened 4 years ago

fthain commented 4 years ago

QEMU symptoms look like "DOUBLE MMU FAULT" or a SIGABORT.

qemu-crash-dp8393x-DEBUG.log

I'm fairly sure that the same bug sometimes shows up as a Linux kernel Oops in skb_release_data. It can also show up as a non-responsive network interface, followed by a crash.

qemu-crash-macsonic-skb_release_data.log

To reproduce these symptoms requires heavy network traffic. Ping flooding can do it.

One way to reproduce the crash is with a few of these, after the network goes silent: echo macsonic > /sys/bus/platform/drivers/macsonic/unbind echo macsonic > /sys/bus/platform/drivers/macsonic/bind echo macsonic > /sys/bus/platform/drivers/macsonic/unbind echo macsonic > /sys/bus/platform/drivers/macsonic/bind

I found and fixed some bugs in the macsonic driver (see my github repo) but it didn't make a lot of difference.

vivier commented 4 years ago

Which version of QEMU do you use?

I think my fix to sonic buffer exhaustion (0a45280c9fa4 "dp8393x: fix receiving buffer exhaustion") introduces this problem because it is not correct.

Could you try QEMU from QEMU original repo with you fixed driver?

fthain commented 4 years ago

On Mon, 9 Dec 2019, Laurent Vivier wrote:

Which version of QEMU do you use?

This was an i686 build of the latest q800-dev branch (15f8c9af8c). I can also reproduce these crashes with a build from the q800-dev branch that I made on 2017-05-29 (I don't know what tree it was).

$ /opt/qemu.2017-05-29/bin/qemu-system-m68k --version QEMU emulator version 2.8.92 ()

I think my fix to sonic buffer exhaustion (0a45280c9fa4 "dp8393x: fix receiving buffer exhaustion") introduces this problem because it is not correct.

Could you try QEMU from QEMU original repo with you fixed driver?

OK. I've now tried q800-dev (15f8c9af8c) with commit 0a45280c9f reverted. Packet rx stops working under load but tx keeps going. Chip reset doesn't help. No obvious memory corruption in debug log:

sonic: Receive packet at 1fe27910 sonic: Receive packet at 1fe27200 sonic: Receive packet at 1fe26af0 sonic: Receive packet at 1fe263e0 sonic: Receive packet at 1fe25cd0 sonic: Receive packet at 1fe255c0 sonic: Receive packet at 1fe24eb0 sonic: Receive packet at 1fe247a0 sonic: Receive packet at 1fe24090 sonic: Receive packet at 1fe23980 sonic: Receive packet at 1fe23270 sonic: Receive packet at 1fe22b60 sonic: Receive packet at 1fe22450 sonic: Receive packet at 1fe21d40 sonic: Receive packet at 1fe21630 sonic: Receive packet at 1fe20f20 sonic: Receive packet at 1fe2f910 sonic: Receive packet at 1fe2f200 sonic: Receive packet at 1fe2eaf0

I'll take a closer look at dp8393x.c to see what's going on with the Receive Buffers Exhausted interrupt.

fthain commented 4 years ago

I have put some patches for hw/net/dp8393x.c at https://github.com/fthain/qemu-m68k/commits/sonic but it's still not right. A ping flood now produces an error from the macsonic driver, "eth0: rx desc without RCR_LPKT. Shouldn't happen !?" I'll keep debugging...

fthain commented 4 years ago

I believe the memory corruption bug is now fixed on my "sonic" branch (9b15378a8f). Shall I send a pull request to you? Does upstream have a copyright assignment policy for patch submission?

vivier commented 4 years ago

Thank you for all this work.

Q800 machine is now merged upstream, and the dp8393 is common to an other machine (MIPS) and not maintained by me.

So the process here is to send your series to the qemu-devel mailing list. (add some description in the commit message and check the style with scripts/checkpatch.pl...)

Process and policy are the same as for kernel (we need S-o-B).

vivier commented 4 years ago

Patch 63f5e289f251 "Clean up endianness handling" must be removed: I tried this too in the past and it breaks netBSD on MIPS (it's why it's written like that).

See c744cf78791e ("dp8393x: fix dp8393x_receive()")

vivier commented 4 years ago

Patch 63f5e28 "Clean up endianness handling" must be removed: I tried this too in the past and it breaks netBSD on MIPS (it's why it's written like that).

See c744cf7 ("dp8393x: fix dp8393x_receive()")

Original change is: 409b52bfe199 ("net/dp8393x: correctly reset in_use field")

fthain commented 4 years ago

On Wed, 11 Dec 2019, Laurent Vivier wrote:

Patch 63f5e289f251 "Clean up endianness handling" must be removed: I tried this too in the past and it breaks netBSD on MIPS (it's why it's written like that).

See c744cf78791e ("dp8393x: fix dp8393x_receive()")

Have you tested that? My patch was meant to be clean up; it shouldn't change behaviour.

The problem with the old code was the byte count passed to address_space_rw():

    dp8393x_put(s, width, 0, 0); /* in_use */
    address_space_rw(&s->as, dp8393x_crda(s) + sizeof(uint16_t) * 6 * width,
        MEMTXATTRS_UNSPECIFIED, (uint8_t *)s->data, sizeof(uint16_t), 1);

Commit c744cf78791e did fix that, but it's the wrong fix IMHO.

fthain commented 4 years ago

On Wed, 11 Dec 2019, Laurent Vivier wrote:

Patch 63f5e28 "Clean up endianness handling" must be removed: I tried this too in the past and it breaks netBSD on MIPS (it's why it's written like that).

See c744cf7 ("dp8393x: fix dp8393x_receive()")

Original change is: 409b52bfe199 ("net/dp8393x: correctly reset in_use field")

Thanks for the link. Could this be a NetBSD bug? Or maybe the MIPS CPU is running in little endian mode and the sonic in big endian mode.

I guess my patch will have to be dropped regardless. Pity about that. This is the kind of issue that can't be resolved without access to the hardware.

fthain commented 4 years ago

I have rebased this branch on mainline QEMU and added a new patch to properly handle the "deaf sonic" problem. https://github.com/fthain/qemu/commits/sonic I will post this patch series to qemu-devel after I've put it through checkpatch.pl.