qmk / qmk_firmware

Open-source keyboard firmware for Atmel AVR and Arm USB families
https://qmk.fm
GNU General Public License v2.0
18.09k stars 38.89k forks source link

[Bug] Space65 R3 deadlock on system reboot #19707

Open ForsakenRei opened 1 year ago

ForsakenRei commented 1 year ago

Describe the Bug

Found this issue earlier, today just pulled latest dev branch and flashed it again to make sure the firmware is up to date. Tested on two different machine, 1 desktop PC and my laptop. When restart the machine the board will just go into deadlock. All the keys have no response and RGB stops(I used a default color spectrum one). Reconnect the cable will solve the problem.

Tried different boards on either of those machine but cannot reproduce the issue. So it seems to be a board(or my firmware) issue instead of host USB issue, but the board works totally fine other times as my daily driver now. Searched issues but it seems I might be the only one who have this issue?

The uncommitted changes in qmk doctor should from my doc editing brach.

Keyboard Used

gray_studio/space65r3

Link to product page (if applicable)

No response

Operating System

Windows 10 22H2

qmk doctor Output

Ψ QMK Doctor is checking your environment.
Ψ CLI version: 1.1.1
Ψ QMK home: C:/Users/Shigure/qmk_firmware
Ψ Detected Windows 10 (10.0.19045).
Ψ QMK MSYS version: 1.7.2
Ψ Git branch: develop
Ψ Repo version: 0.19.10
⚠ Git has unstashed/uncommitted changes.
Ψ - Latest develop: 2023-01-29 02:42:44 +1100 (b727434391) -- Remove commented out backlight config & stray "backlight levels" (#19703)
Ψ - Latest upstream/master: 2023-01-29 00:41:50 +0800 (981f3c316c) -- Additional handedness by EEPROM examples (#19686)
Ψ - Latest upstream/develop: 2023-01-28 21:16:59 +0000 (80cc6ad187) -- Fix 19701 merge
Ψ - Common ancestor with upstream/master: 2023-01-26 11:34:27 -0500 (3823046712) -- new keyboard: edinburgh41 (#19569)
Ψ - Common ancestor with upstream/develop: 2023-01-29 02:42:44 +1100 (b727434391) -- Remove commented out backlight config & stray "backlight levels" (#19703)
Ψ All dependencies are installed.
Ψ Found arm-none-eabi-gcc version 10.1.0
Ψ Found avr-gcc version 8.5.0
Ψ Found avrdude version 7.0
Ψ Found dfu-programmer version 0.7.2
Ψ Found dfu-util version 0.11
Ψ Submodules are up to date.
Ψ Submodule status:
Ψ - lib/chibios: 2023-01-03 19:29:26 +0000 --  (0062927e3)
Ψ - lib/chibios-contrib: 2023-01-11 16:42:27 +0100 --  (a224be15)
Ψ - lib/googletest: 2021-06-11 06:37:43 -0700 --  (e2239ee6)
Ψ - lib/lufa: 2022-08-26 12:09:55 +1000 --  (549b97320)
Ψ - lib/vusb: 2022-06-13 09:18:17 +1000 --  (819dbc1)
Ψ - lib/printf: 2022-06-29 23:59:58 +0300 --  (c2e3b4e)
Ψ - lib/pico-sdk: 2022-09-19 18:02:44 +0200 --  (8d56ea3)
Ψ - lib/lvgl: 2022-04-11 04:44:53 -0600 --  (e19410f8)
Ψ QMK is ready to go, but minor problems were found

Is AutoHotKey / Karabiner installed

Other keyboard-related software installed

N/A

Additional Context

My keymaps and other files used to compile firmware: https://github.com/ForsakenRei/qmk-space65-r3

ForsakenRei commented 1 year ago

Just noticed if I restart PC/laptop it will go into deadlock, but if I shutdown the PC totally(turn off PSU switch) then cold boot everything is just fine.

It seems if I suspend/resume my VM the board will go into deadlock as well, which is quite confusing...

ForsakenRei commented 1 year ago

Static electricity will kill it as well until I replug the USB cable, probably somewhere of the upper case is touching the PCB which is not a QMK issue.

ForsakenRei commented 1 year ago

It seems not related to the board itself or USB after some extended testing including disamble etc., since compile a default keymap will just yield the same result on reboot, it should not be my own keymap broke it. The only firmware I can find didn't have this issue is the stock one provided by Gray Studio...

Skynet011 commented 1 year ago

Where can i find the default firmware? I updated to the new firmware on via website and my keyboard is having the same problem. When I turn my computer on it does not register any keystrokes but the LED’s show that the keyboard is connected. Only after unplugging and plugging the keyboard again is when it starts registering keystrokes.

ForsakenRei commented 1 year ago

Where can i find the default firmware? I updated to the new firmware on via website and my keyboard is having the same problem.

Don't use the one from VIA they are usually outdated. If you don't need a lot of fancy functions then Gray Studio has a Vial compatible firmware in their discord, which works fine based on my tests.

The default here means the default keymap, you will need compile the firmware with that keymap the flash to your board. Kinda glad to see I'm not the only one who has this issue.

Skynet011 commented 1 year ago

i'm glad too. I just assembled the board and i was thinking that i broke something. I went to their discord but i cant seem to find any firmware there.

Skynet011 commented 1 year ago

I sent an email to graystudio, they sent me their default firmware and all is working now. ;)

ForsakenRei commented 1 year ago

I sent an email to graystudio, they sent me their default firmware and all is working now. ;)

Yeah the stock firmware they provided will work, though not all QMK features are supported even with Vial. That's basically why I compile my owner firmware for my boards. I will bring this issue to the discrod some time when I have more free time from current work lol.

dmitrii-nik commented 1 year ago

Having same issue. I'm on Ubuntu 22.04. Issue appears during reboot and the board is unresponsive due to some deadlock. Here is a log from dmesg: [ 376.135773] usb 3-2.1.1: new full-speed USB device number 23 using xhci_hcd [ 376.215982] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.404066] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.591890] usb 3-2.1.1: new full-speed USB device number 24 using xhci_hcd [ 376.672084] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.860034] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.968256] usb 3-2.1-port1: attempt power cycle [ 377.571863] usb 3-2.1.1: new full-speed USB device number 25 using xhci_hcd [ 377.572176] usb 3-2.1.1: Device not responding to setup address. [ 377.780159] usb 3-2.1.1: Device not responding to setup address. [ 377.987860] usb 3-2.1.1: device not accepting address 25, error -71 [ 378.067848] usb 3-2.1.1: new full-speed USB device number 26 using xhci_hcd [ 378.068162] usb 3-2.1.1: Device not responding to setup address. [ 378.276048] usb 3-2.1.1: Device not responding to setup address. [ 378.483850] usb 3-2.1.1: device not accepting address 26, error -71 [ 378.484019] usb 3-2.1-port1: unable to enumerate USB device After this only unplugging/replugging cable helps.

I tried two quick fixes so far:

  1. define USB_SUSPEND_WAKEUP_DELAY 5000 in config.h, didn't help.

  2. Adding NO_USB_STARTUP_CHECK = yes to the rules.mk removes this issue, but then LED's are not going to sleep anymore.
ForsakenRei commented 1 year ago

Having same issue. I'm on Ubuntu 22.04. Issue appears during reboot and the board is unresponsive due to some deadlock. Here is a log from dmesg: [ 376.135773] usb 3-2.1.1: new full-speed USB device number 23 using xhci_hcd [ 376.215982] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.404066] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.591890] usb 3-2.1.1: new full-speed USB device number 24 using xhci_hcd [ 376.672084] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.860034] usb 3-2.1.1: device descriptor read/64, error -32 [ 376.968256] usb 3-2.1-port1: attempt power cycle [ 377.571863] usb 3-2.1.1: new full-speed USB device number 25 using xhci_hcd [ 377.572176] usb 3-2.1.1: Device not responding to setup address. [ 377.780159] usb 3-2.1.1: Device not responding to setup address. [ 377.987860] usb 3-2.1.1: device not accepting address 25, error -71 [ 378.067848] usb 3-2.1.1: new full-speed USB device number 26 using xhci_hcd [ 378.068162] usb 3-2.1.1: Device not responding to setup address. [ 378.276048] usb 3-2.1.1: Device not responding to setup address. [ 378.483850] usb 3-2.1.1: device not accepting address 26, error -71 [ 378.484019] usb 3-2.1-port1: unable to enumerate USB device After this only unplugging/replugging cable helps.

I tried two quick fixes so far:

  1. define USB_SUSPEND_WAKEUP_DELAY 5000 in config.h, didn't help.

  2. Adding NO_USB_STARTUP_CHECK = yes to the rules.mk removes this issue, but then LED's are not going to sleep anymore.

I haven't try NO_USB_STARTUP_CHECK, how you make your LED to sleep? Here's what I'm using in my keymap.c:

static uint32_t key_timer;           // timer to track the last keyboard activity, use 32bit value and function to make longer idle time possible
static void refresh_rgb(void);       // refreshes the activity timer and RGB, invoke whenever activity happens
static void check_rgb_timeout(void); // checks if enough time has passed for RGB to timeout
bool is_rgb_timeout = false;         // store if RGB has timed out or not in a boolean

void refresh_rgb(void)
{
    key_timer = timer_read32(); // store time of last refresh
    if (is_rgb_timeout)
    {
        is_rgb_timeout = false;
        rgblight_wakeup();
    }
}
void check_rgb_timeout(void)
{
    if (!is_rgb_timeout && timer_elapsed32(key_timer) > RGBLIGHT_TIMEOUT)
    {
        rgblight_suspend();
        is_rgb_timeout = true;
    }
}
/* Then, call the above functions from QMK's built in post processing functions like so */
/* Runs at the end of each scan loop, check if RGB timeout has occured */
void housekeeping_task_user(void)
{
#ifdef RGBLIGHT_TIMEOUT
    check_rgb_timeout();
#endif
}
/* Runs after each key press, check if activity occurred */
void post_process_record_user(uint16_t keycode, keyrecord_t *record)
{
#ifdef RGBLIGHT_TIMEOUT
    if (record->event.pressed)
        refresh_rgb();
#endif
}
/* Runs after each encoder tick, check if activity occurred */
void post_encoder_update_user(uint8_t index, bool clockwise)
{
#ifdef RGBLIGHT_TIMEOUT
    refresh_rgb();
#endif
}
dmitrii-nik commented 1 year ago

I don't have some specific processing for the sleep, there is #define RGBLIGHT_SLEEP in config.h What it does normally is if I suspend the laptop, the keyboard LEDs are turned off and keyboards suspends (and then wakes up and wakes up the system on key press). But if I'm using NO_USB_STARTUP_CHECK LEDs are constantly on during the laptop suspend. Moreover as I can see NO_USB_STARTUP_CHECK causes the keyboard to just not suspend anymore.

ForsakenRei commented 1 year ago

I don't have some specific processing for the sleep, there is #define RGBLIGHT_SLEEP in config.h What it does normally is if I suspend the laptop, the keyboard LEDs are turned off and keyboards suspends (and then wakes up and wakes up the system on key press). But if I'm using NO_USB_STARTUP_CHECK LEDs are constantly on during the laptop suspend. Moreover as I can see NO_USB_STARTUP_CHECK causes the keyboard to just not suspend anymore.

Ah I see, it's the suspend LED turn off. The code above I have will turn off the RGB after some idle time(not sure what will happen if I use NO_USB_STARTUP_CHECK though). Based on what I read from the doc, NO_USB_STARTUP_CHECK will disable the suspend check so yes it will not suspend anymore. And it's mainly useful for split keyboard. We will probably need to bring this question to the QMK/Gray Studo since we already have multiple samples of incidents.

drashna commented 1 year ago

what about WAIT_FOR_USB = yes?

dmitrii-nik commented 1 year ago

Tried this, looks like WAIT_FOR_USB is not used if added to rules.mk (tried on qmk 0.20.4). Anyway I enabled the WAIT_FOR_USB by undefining directly in tmk_core/protocol/chibios/chibios.c. Didn't help.

I noticed one more thing: when I do the shutdown and then power the laptop on, the keyboard functions in the OS selection screen, but it stops working once system starts to boot (can see same 'unable to enumerate device' message). When doing reboot keyboard is unresponsive right away (LEDs turn on, but it doesn't react to any keys, led toggle doesn't function, etc.).

ForsakenRei commented 1 year ago

~I will give WAIT_FOR_USB=yes a try tonight but I didn't have too much hope on it after reading the comment lol.~ Didn't work on my board either...

The problem for laptop is even it is shutdown, most of the times USB still have power. I used my desktop PC to do the test and it seems a fully cold boot won't have that issue, otherwise the keyboard will work for a few sec during post then it's totaly dead on reboot, LED is still on but stop moving(I used the spiral as start up effect).

P.S. It might not be related but mine also disconnect from USB like once a few days, the symptop is the same, keyboard not responds and RGB stop moving but is still on.

dmitrii-nik commented 1 year ago

I have asked graystudio support about qmk version on which the original firmware is based, that is 0.16.8. I tried 0.16.8 and it works after reboot. So some change between 0.16.8 and 0.20.4 is causing the deadlock.

Upd: Checked a bit more, it breaks once switching from 0.16.9 to 0.17.0. Not sure, where yet.

ForsakenRei commented 1 year ago

0.17.0 is 2022 May breaking changes https://github.com/qmk/qmk_firmware/blob/master/docs/ChangeLog/20220528.md though I cannot spot anything from the change log.

ForsakenRei commented 1 year ago

@dmitrii-nik it seems 0.20.6 fixed the issue for me, if you can confirm it I will close this issue.

dmitrii-nik commented 1 year ago

I've tried the 0.20.7 today and behavior was the same, board hangs on reboot. Did you change something in the config from when you tried initially?

ForsakenRei commented 1 year ago

Hmm, I didn't make any changes excpet applying my own changes to config.h and ruls.mk. But that's only 1 single restart I did. I will try 0.20.7 and mutiple restart later to see if I just got too lucky for that single success.

ForsakenRei commented 1 year ago

I've tried the 0.20.7 today and behavior was the same, board hangs on reboot. Did you change something in the config from when you tried initially?

Tried 0.20.8/0.21.0 on both Win and Manjaro but neither of them hangs on reboot...? Do you have any special settings?

ForsakenRei commented 11 months ago

It seems no one still have this issue, close as completed.

AndreasBackx commented 8 months ago

Could this issue be reopened again or should I create a new issue? @ForsakenRei

When my Space65 R3 gets powered on when my PC is booting, the LED will turn red and it's unresponsive as described in the first post of this issue. It will also do this when during that phase I replug it and then it goes from my bootloader to Windows where it becomes unresponsive again. I've broken a Space65 R3 daughterboard because I've replugged my keyboard so often that the USB connection started to fail. I'm on my last daughterboard and of course want to avoid breaking my keyboard entirely and resolve this in general.

Is there anything I can do to help find out what the issue is? I can only reproduce it when my PC is booting as replugging the keyboard fixes it and doesn't reproduce it for me.

I just flashed 0.23.8 to it where it still reproduces.

ForsakenRei commented 8 months ago

Is there anything I can do to help find out what the issue is? I can only reproduce it when my PC is booting as replugging the keyboard fixes it and doesn't reproduce it for me.

I just flashed 0.23.8 to it where it still reproduces.

Usually isolating the issue is the first thing. Have you tried a different machine? Maybe also try the default keymap and see if it the same. And disable RGB from firmware. When it happened to me, with default firmware RGB will freeze at the colors it was on, not really red but things might changed.

And...not a real solution but for friaglie daughter board, use a detachable cable might be helpful if you need reset the keyboard this way a lot. Or if you boted into OS, something like USBTreeView for can restart USB port.

AndreasBackx commented 8 months ago

Usually isolating the issue is the first thing. Have you tried a different machine?

Yes, this occurs on multiple machines.

Maybe also try the default keymap and see if it the same.

I have a custom keymap that only slightly differs from default, I'll give default a try and report back.

And disable RGB from firmware. When it happened to me, with default firmware RGB will freeze at the colors it was on, not really red but things might changed.

Will give that a try.

And...not a real solution but for friaglie daughter board, use a detachable cable might be helpful if you need reset the keyboard this way a lot. Or if you boted into OS, something like USBTreeView for can restart USB port.

Hah, yeah. Would want to avoid solutions like this. Though, I might get a detachable cable for another reason and might just buy 2 while I'm at it, depending on how sturdy and strong the magnetic connection is.

Is there a way I could possibly replicate the bug somehow and then run a bisect over the code with the repro case to find the exact commit that triggered it?

ForsakenRei commented 8 months ago

Hah, yeah. Would want to avoid solutions like this. Though, I might get a detachable cable for another reason and might just buy 2 while I'm at it, depending on how sturdy and strong the magnetic connection is.

I personally used a cable with YC8 connector, it's not magnetic and connection is good, but don't expect it to be as easy as detach a mag safe cable from Macbook, you will need both hands.

Is there a way I could possibly replicate the bug somehow and then run a bisect over the code with the repro case to find the exact commit that triggered it?

I have no clue, it happens during system boot up so collect log or use the debug console are not really fesiable I guess. Probably someone from the dev team will know better.

AndreasBackx commented 7 months ago

@ForsakenRei apologies for the late reply. Disabling the RGB light fixes the issue indeed. I think it's still worth it to keep the issue open for investigation possibly now that we know the root cause?

ForsakenRei commented 7 months ago

@ForsakenRei apologies for the late reply. Disabling the RGB light fixes the issue indeed. I think it's still worth it to keep the issue open for investigation possibly now that we know the root cause?

We can keep this open, I only have 1 RGB effect enabled others were all disabled and didn't have the same issue as before, maybe you can give it a try, just leave one RGB effect you actaully use? ref this line https://github.com/ForsakenRei/qmk-space65-r3/blob/e52fc52f4bc5eeade431ce7c2361176a793a29c7/gray_studio/space65r3/info.json#L21

AndreasBackx commented 7 months ago

Currently cannot give it a shot, but will report back when I can. I did experience a deadlock recently again even with RGB disabled so it seems to still happen, just less frequently.

PeterMortensen commented 3 months ago

Re "the machine the board will just go into deadlock": Is it a real deadlock or does it just take a very long time?

What happens if restarting immediately after powerup? For example, if the process takes 2 minutes (2 minutes from the keyboard got power until some time in the restart process), will it come out of deadlock after about 2 minutes? If waiting to restart 40 minutes after the keyboard got power, will it come out of deadlock after about 40 minutes?

That could be the case if the software (implicitly) assumes the tick counter (timer_read32) is always increasing. If the tick counter is reset to zero for some reason and the software is using a sample of an old value of the tick counter (e.g., to wait for some time to pass) it would wait for a very long time (until the tick counter has increased past the sampled value).

I have observed such a reset of the tick counter in a different circumstance (wakeup after keyboard sleep for a Keychron K Pro series keyboard (in wireless mode)). The reset apparently happens intermittently, though it can't yet be ruled out it depends on the exact way the keyboard is operated.