vintagepc / MINI404

Like MK404... with an appendectomy :smile: (Simulates MK3.x, MK4, MINI and XL)
https://vintagepc.github.io/MINI404/
Other
26 stars 8 forks source link

[BUG] Can't run on Windows 11 #145

Closed BlueFyre closed 6 months ago

BlueFyre commented 6 months ago

Describe the bug

./qemu-system-buddy.exe -machine prusa-mk3-35 -kernel MK3.5_firmware_5.2.1.bbf -icount auto
# FIXME: OTP and engineering bytes should be split.
**
ERROR:../hw/core/gpio.c:108:qdev_get_gpio_in_named: assertion failed: (n >= 0 && n < gpio_list->num_in)
Bail out! ERROR:../hw/core/gpio.c:108:qdev_get_gpio_in_named: assertion failed: (n >= 0 && n < gpio_list->num_in)

To Reproduce Steps to reproduce the behavior:

  1. Build from source using MSYS2
  2. Run the built executable

Desktop (please complete the following information):

This also seems to happen with the latest Mini404-dev-w64.zip (though I had to manually copy over a bunch of missing .dll files)

vintagepc commented 6 months ago

Thanks for the report.

A couple of things to unpack here:

What's your skill level? Are you comfortable working in a debugger and setting breakpoints? If so you might be able to get me the necessary info which IRQ is throwing that error. For some silly reason QEMU elects not to print the actual name of the item that failed :(

BlueFyre commented 6 months ago

I'll reply to the other bugs respectively to contain the discussions. I'll try to set up the debugger, haven't worked with this particular code base but I can try to sort my way through it. I saw some instructions about VS Code so I'll try that. Any particular location you'd like me to place a breakpoint?

I'm not entirely sure I built it correctly to be honest. I built using MING64 using: ./configure --target-list=buddy-softmmu --enable-gtk --enable-libusb --disable-werror

Seems like the same problem with the MK4:

./qemu-system-buddy -machine prusa-mk4-027c -kernel MK4_MK3.9_firmware_5.1.3.bbf -icount auto
# FIXME: OTP and engineering bytes should be split.
**
ERROR:../hw/core/gpio.c:108:qdev_get_gpio_in_named: assertion failed: (n >= 0 && n < gpio_list->num_in)
Bail out! ERROR:../hw/core/gpio.c:108:qdev_get_gpio_in_named: assertion failed: (n >= 0 && n < gpio_list->num_in)
BlueFyre commented 6 months ago

I couldn't figure out how to launch it in debug mode. If you have any ideas that would be good

I managed to get something going in WSL2 instead (Debian). I can launch using an older firmware for the Mini at least (4.4.1) image

Is it a current limitation that we can't run any of the newer firmware? Ideally I was hoping to look at the MK3.5 firmware

vintagepc commented 6 months ago

Your configure sounds good. If you want to be really sure, there are a handful of pre-defined tasks under vscode (CTRL+Shift+B) that handle the configure arguments correctly depending on the different features of the build. There's also a set of launch/debug entries, though these require you to have the "native debug" extension enabled to support GDB.)

The place to start looking is here, in hw/core/gpio.c:

image

GDB should automatically stop when that assert is hit, allowing you to inspect the "name" string to see which thing it was trying to connect when it failed.

Is it a current limitation that we can't run any of the newer firmware? Ideally I was hoping to look at the MK3.5 firmware

I've been running (non-bbf) firmware builds of 5.2.0 and more recent firmware lately while trying to move forward on XL support and fixing a few MK4 issues. I don't think this is a systemic problem, just specific to the MK3.5 and something that isn't playing nice on Windows builds. Once we figure out the cause I suspect that's going to be a trivial fix.

Also, something to check - does your MSYS build have the following libraries? They are not required but if they are missing certain parts of the code are disabled and there might be a bug in the handling of those connections if GL support is disabled:

BlueFyre commented 6 months ago

I couldn't ever seem to get the executable to stop when I tried -s -S for this bug. Though I suppose I won't need to do anything further given your PR

vintagepc commented 6 months ago

Well, there is still the outstanding issue of that assert that is being hit.

Clarification - we are talking about debugging the QEMU binary itself. the -S and -s options are for debugging the running firmware inside QEMU.

The former is achieved by starting qemu-system-buddy via GDB, and it should be built with --enable-debug to facilitate this.

BlueFyre commented 6 months ago

Oh that's why I'm confused. Okay I'll try that out

vintagepc commented 6 months ago

If you keep having issues, you can try adding this right before that assert, it will print the failed line to the console before it crashes:

    if (!(n >= 0 && n < gpio_list->num_in))
    {
        printf("Failed to retrieve GPIO %s pin %d\n", name, n);
    }
BlueFyre commented 6 months ago

I thought I was going a bit crazy but apparently it doesn't happen at every launch. Sometimes it will actually be able to launch successfully. Anyhow after trying to rerun it a bunch of times here's one time where it fails: image

vintagepc commented 6 months ago

:thinking: That's... very weird. There's nothing non-deterministic about that particular connection at all. In fact, it's part of some common boilerplate code used for all stm32 implementations in my QEMU fork to initialize the SoC and its internal peripherals according to their maps.

https://github.com/vintagepc/MINI404/blob/e9f167df58e8ef0c32d17dde8d5d3179e8a6b40c/hw/arm/prusa/stm32_common/stm32_common.c#L289

I have a really hard time imagining how that could be absent sometimes and not others unless there's a weird initialization delay .

Are you able to look "up" the call stack into the stm32_soc_realize_peripheral() function to see what the value of id is when this happens, and whether it is always the same one? That would tell me what peripheral is being problematic.

BlueFyre commented 6 months ago

Seems to be id 46 when I tried it. It takes a lot more trials on the debug launch before I can get it to fail. Whereas from the command line it'll seem like it fails more often (although apparently I wasn't launching it enough times to see it succeed too...)

vintagepc commented 6 months ago

That definitely smells like a race condition of some kind, debug builds naturally run things slower and differently than release builds since they have fewer/no compiler optimizations.

Lets try adding some more safety checks around that function. the PWR peripheral is not of the correct type to have a clock input IRQ, but that macro should also assert if there's a problem casting to the new type...

Can you replace this (~line 287 of hw/arm/prusa/stm32_common/stm32_common.c):

    if (id > STM32_P_RCC && stm32_rcc_if_has_clk(STM32_PERIPHERAL(s->perhiperhals[id])))
    {
        stm32_rcc_if_set_periph_clk_irq(STM32_PERIPHERAL(s->perhiperhals[id]), qdev_get_gpio_in_named(s->perhiperhals[id],"clock-change",0));
    }

with:

    STM32Peripheral *p = STM32_PERIPHERAL(s->perhiperhals[id]);
    if (id > STM32_P_RCC && p != NULL && stm32_rcc_if_has_clk(p))
    {
        stm32_rcc_if_set_periph_clk_irq(p, qdev_get_gpio_in_named(s->perhiperhals[id],"clock-change",0));
    }

and let me know if it behaves better?

BlueFyre commented 6 months ago

Didn't seem to help. stm32_rcc_if_has_clk(p) seems to return 171 so the check passes in this case

vintagepc commented 6 months ago

OK. Let me try a more explicit/complex solution. I'll have something for you in a few minutes.

vintagepc commented 6 months ago

If you could try out the vintagepc/145-fix-msys-crash branch and let me know if that fixes it for you that'd be awesome. Thanks!

BlueFyre commented 6 months ago

Unfortunately it still seems to be failing. I put in the printf code you had above just to be sure I recompiled it correctly: Failed to retrieve GPIO clock-change pin 0

The id is now 67 if that means anything to you

vintagepc commented 6 months ago

Progress! I missed blacklisting the external interrupt handler since it's not in the hw/arm/prusa folder and so I didn't find it in the first pass. Pushed a fix for that entry. I hope that's all of them but there may be one or two more where we're using the "native" QEMU implementations that also need blacklisting.

BlueFyre commented 6 months ago

Just grabbed the latest code. The issue above doesn't seem to happen after trying it a bunch of times (outside of the debugger just to be sure). Though it doesn't seem to start up: qemu: fatal: Lockup: can't escalate 3 to HardFault (current priority -1)

vintagepc commented 6 months ago

Great news!

The hardfault error usually happens if you are missing a "-kernel" argument and the CPU has no code to execute. Is that the case here?

It's usually a "soft" error in that it's a problem with the firmware running in the simulated environment (Buddy FW), rather than a problem with the hosting software (i.e. QEMU) - it means the simulated CPU encountered a software fault and has no appropriate fault handler in software, so it locks in an error state.

BlueFyre commented 6 months ago

Hmm, I'm running: qemu-system-buddy -machine prusa-mk3-35 -kernel MK3.5_firmware_5.2.1.bbf -icount auto I've got MK3.5_firmware_5.2.1.bbf and Prusa_Mk3v5_Boot.bin in the build folder as well The Prusa_Mk3v5_Boot.bin is just a renamed file from mini_release_noboot.zip (is that expected?)

I think last time I saw something "running" it said "Looking for bbf" in the GUI but it doesn't get there anymore

vintagepc commented 6 months ago

The Prusa_Mk3v5_Boot.bin is just a renamed file from mini_release_noboot.zip (is that expected?)

That may be the cause. it should be a renamed 128k file, bootloader.bin from the zip file referenced in bootstrap.py

BlueFyre commented 6 months ago

Progress I suppose! Back to that problem state I mentioned to you previously

image

vintagepc commented 6 months ago

Great!

The "looking for BBF" means it's trying to load the xflash data (languages, images) from BBF on USB. You can attach a simulated storage device with the following arguments:

-drive id=usbstick,readonly=on,file=fat:usb_folder -device usb-storage,drive=usbstick

This will map a folder called usb_folder to the fake USB drive. Make sure this folder contains a copy of the .bbf file.

If you haven't already found it, be sure to check out the Command line Helper tool that helps (somewhat) explain and generate the various command line options for QEMU.

BlueFyre commented 6 months ago

Thanks, I'm trying that now. Seems to take quite a while but I think it's updating. Will let you know once it works out but I think your fix seems to be working at the very least

vintagepc commented 6 months ago

Glad to hear it. Note it'll be a lot faster to boot (and run) if you reconfigure/recompile without --enable-debug. -global STM32F4xx-usb.disable_sof_interrupt=true as an additional argument can also help with performance.

BlueFyre commented 6 months ago

Definitely a lot faster recompiling it without debug and disabling that. Looks like it's working now:

image

Thanks for your help

vintagepc commented 6 months ago

Awesome. I'll get that branch merged to close out this issue.

I haven't worked with the 3.5 very much yet so don't hesitate to reach out if you find more problems. I'll file a new bug for that fan issue as well.