qmk / qmk_firmware

Open-source keyboard firmware for Atmel AVR and Arm USB families
https://qmk.fm
GNU General Public License v2.0
17.99k stars 38.66k forks source link

[Bug] interconnect on infinity ergodox broken in latest firmware #19420

Open typorian opened 1 year ago

typorian commented 1 year ago

Describe the Bug

When using the altest firmware on an input club ergodox infinity split keyboard, the secondary half stops working after a few seconds. When plugging both parts into the computer seperately they work, but layer shifting across sides does not work in that case, as expected.

Keyboard Used

input_club/ergodox_infinity

Link to product page (if applicable)

No response

Operating System

GNU/Linux Fedora

qmk doctor Output

No response

Is AutoHotKey / Karabiner installed

Other keyboard-related software installed

No response

Additional Context

the interconnect works with firmware compiled with qmk 0.14.X and with the original firmware.

sergei-dyshel commented 1 year ago

confirming - interconnect broken starting from version 0.19.1. Previous version (0.18.17) works.

andresilva commented 1 year ago

FWIW this is the commit that broke it 4020674163. It still works properly if both sides are connected directly to the computer.

fauxpark commented 1 year ago

No, it's not. That's just a merge commit.

andresilva commented 1 year ago

@fauxpark Yeah sorry, I posted the wrong commit, I updated my comment in the meantime. Although it's just the commit that merges the 0.19.0 changes from develop, so the information is probably not that useful anyway (I'd have to bisect on the develop branch).

Hylian commented 1 year ago

I had the same issue and bisected down to the offending commit:

6f13a76530165bb1ad723ab0270c9eb908ca3a8c is the first bad commit
commit 6f13a76530165bb1ad723ab0270c9eb908ca3a8c
Author: Stefan Kerkmann <karlk90@pm.me>
Date:   Sun Oct 2 15:35:33 2022 +0200

    [Core] ChibiOS: Fix USB bus disconnect handling (#18566)

 tmk_core/protocol/chibios/usb_main.c | 3 ++-
 tmk_core/protocol/chibios/usb_util.c | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)
sigprof commented 1 year ago

18566 has changes in 3 places:

  1. Add usbStop() into init_usb_driver(). This part looks questionable — it is an initialization function, and the USB driver should be stopped at that point (although ChibiOS should allow duplicate usbStop() calls, the LLD may have bugs in that area). @KarlK90, does that part really do something for RP2040?

  2. Swap the order of usbStop() and usbDisconnectBus() in the default implementation of restart_usb_driver(). This should not change anything for Ergodox Infinity, because that keyboard has BOARD = IC_TEENSY_3_1, and the corresponding board file replaces restart_usb_driver() with a function that does nothing (apparently the USB drivers for all Kinetis chips are slightly broken and do not survive the restart).

  3. Add usbDisconnectBus() to usb_disconnect(). That function is called when the half is determined to be the slave one.

Can you try reverting the changes 1 and 3 one by one in the latest sources, and then trying the resulting code on your hardware to determine which part caused the problem? Please try them separately — e.g., remove that usbStop() call from init_usb_driver(), try to compile and flash that firmware; then restore the code in init_usb_driver() and try removing usbDisconnectBus() from usb_disconnect(); although if changing only one of those places does not fix the problem, you can then try reverting both of them at once.

sigprof commented 1 year ago

The underlying LLD implementation of usbStop() in the Kinetis driver looks buggy:

void usb_lld_stop(USBDriver *usbp) {
  /* TODO: If in ready state then disables the USB clock.*/
  if (usbp->state == USB_STOP) {
#if KINETIS_USB_USE_USB0
    if (&USBD1 == usbp) {
#if KINETIS_USB0_IS_USBOTG
      nvicDisableVector(USB_OTG_IRQn);
#else /* KINETIS_USB0_IS_USBOTG */
      nvicDisableVector(USB_IRQn);
#endif /* KINETIS_USB0_IS_USBOTG */
    }
#endif /* KINETIS_USB_USE_USB0 */
  }
}

The problem is that when usbStop() is called after the driver has been started (so it's the USB_{READY,SELECTED,ACTIVE,SUSPENDED}USB_STOP transition), usbp->state still has the original state at that point (usbStop() sets it to USB_STOP after usb_lld_stop() returns), therefore this implementation of usb_lld_stop() just does nothing; however, ChibiOS then assumes that the USB hardware has been stopped, and proceeds to reset the endpoint configuration, even though the hardware may still have some operations queued, and even interrupts are not disabled.

andresilva commented 1 year ago

@sigprof I tested your suggestions and reverting change 3) fixed the problem. I first tried just reverting change 1) and it didn't fix it.

sigprof commented 1 year ago

The actual implementation of usbDisconnectBus() is this (HAL does not add any code of its own there):

/* Writing to USB0->CONTROL causes an unhandled exception when USB module is not clocked. */
#if KINETIS_USB0_IS_USBOTG
#define usb_lld_disconnect_bus(usbp) if(SIM->SCGC4 & SIM_SCGC4_USBOTG) {USB0->CONTROL &= ~USBx_CONTROL_DPPULLUPNONOTG;} else {}
#else /* KINETIS_USB0_IS_USBOTG */
#define usb_lld_disconnect_bus(usbp) if(SIM->SCGC4 & SIM_SCGC4_USBFS) {USB0->CONTROL &= ~USBx_CONTROL_DPPULLUPNONOTG;} else {}
#endif /* KINETIS_USB0_IS_USBOTG */

The clock enable bit that is tested here is set in usb_lld_start() and then never gets reset, so the problem is probably caused not by the USB0->CONTROL write itself, but by some of its side effects (e.g., it results in the USB controller generating an interrupt after some delay, and with the broken usb_lld_stop() that actually does not stop anything that results in the interrupt handler doing some wrong things).

Unfortunately, that usbDisconnectBus() is the main part of the fix from #18566, so just removing it effectively brings the RP2040 bug back.

One possible workaround is adding this to some config.h file for the Ergodox Infinity:

#define usb_lld_disconnect_bus(usbp) do {} while(0)

(the driver actually has an #if !defined(usb_lld_disconnect_bus) around its implementation, so it's possible to override it with a custom one).

Hylian commented 1 year ago

Just checking in to say that the workaround of redefining usb_lld_disconnect_bus() has been working great for me on tip of master.

I did have one small issue where when my keyboard was plugged into the USB3.1 port of my motherboard, the Linux kernel would hang for a while on shutdown with a "maybe bad USB cable?" message. Switching the keyboard over to a normal USB2.0 port fixed it. I haven't fully validated the issue, so not sure if it's related to this change, but figured I'd note it.

andresilva commented 1 year ago

@Hylian https://github.com/qmk/qmk_firmware/issues/6713 for the issue related to USB3, please comment there so we can hopefully reopen that issue.

Hylian commented 1 year ago

Sorry, haven't been able to triage this issue further yet after the holidays. I will say that with the workaround, I have still been getting some flakiness, which seems like it might be occurring during computer reboot/shutdown. The right half will occasionally freeze after shutdown, with the right LCD backlight staying lit when it should timeout. I also occasionally see the right half freeze up while the computer is running, fixed by an unplug/replug of just the right half.

For the "bad USB cable" message, I'm observing it on a USB3.1 port on the motherboard, as well as on a USB 3.1 hub. When plugged into a USB 2.0 port, I don't see it. If I manage to get some more info on it, I'll re-open #6713.

I'll try rolling back to before 6f13a76530165bb1ad723ab0270c9eb908ca3a8c to see if the issues go away. I suspect a proper fix may need to be more involved than just undefining usb_lld_disconnect_bus.

phantomwhale commented 1 year ago

First time flashing my Ergodox Infinity in several years - no joy with all the "friendly" tools (QMK configurator), dropped back to the source code, things didn't seem to work cleanly, and then stumbled over this issue.

Not sure my C is up to much right now, so stepped back to version 0.18.17 (as noted above), moved over my keymap.c and ran qmk flash -bl dfu-util-split-left to flash the left side, and qmk flash -bl dfu-util-split-right to flash the right side (no idea if that's how it needs to be done?!) and things look like they are finally working.

Subscribing here, keen to see if things get fixed up on master (assume that would mean the QMK configurator would become a working option again? Although if the two differing flash commands are required, perhaps I still need to use the source code for flashing, as QMK configurator just dumps one .bin file - all a bit out of my depth on this part currently!)

limitedreality commented 1 year ago

Having the same issue. Tried doing:

git checkout 133fe1c

But now when I attempt to build using: make input_club/ergodox_infinity:default:dfu-util-split-left

I'm getting:

Compiling: keyboards/input_club/ergodox_infinity/ergodox_infinity.c In file included from platforms/chibios/platform_deps.h:18, from quantum/quantum.h:18, from keyboards/input_club/ergodox_infinity/ergodox_infinity.h:3, from keyboards/input_club/ergodox_infinity/ergodox_infinity.c:1: ./lib/chibios/os/hal/include/hal.h:136:2: error: #error "obsolete or unknown configuration file" 136 #error "obsolete or unknown configuration file" ^~~~~ [ERRORS]

make[1]: *** [builddefs/common_rules.mk:359: .build/obj_input_club_ergodox_infinity_default/keyboards/input_club/ergodox_infinity/ergodox_infinity.o] Error 1 Make finished with errors

jbecca commented 1 year ago

I also tried checking out 133fe1c and have the same issue with getting QMK to build anything from it

 make input_club/ergodox_infinity:default:dfu-util-split-left
QMK Firmware 0.20.6
WARNING: Some git submodules are out of date or modified.
 Please consider running make git-submodule.

Making input_club/ergodox_infinity with keymap default and target dfu-util-split-left

platforms/chibios/platform.mk:102: lib/chibios-contrib/os/common/startup/ARMCMx/compilers/GCC/mk/startup_k20x7.mk: No such file or directory
platforms/chibios/platform.mk:103: lib/chibios/os/common/ports/ARMCMx/compilers/GCC/mk/port_v7m.mk: No such file or directory
platforms/chibios/platform.mk:104: lib/chibios/os/hal/ports/KINETIS/K20x/platform.mk: No such file or directory
platforms/chibios/platform.mk:255: lib/chibios/os/hal/hal.mk: No such file or directory
platforms/chibios/platform.mk:259: lib/chibios/os/rt/rt.mk: No such file or directory
platforms/chibios/platform.mk:261: lib/chibios/os/hal/lib/streams/streams.mk: No such file or directory
make[1]: *** No rule to make target `lib/chibios/os/hal/lib/streams/streams.mk'.  Stop.
make: *** [input_club/ergodox_infinity:default:dfu-util-split-left] Error 1
Make finished with errors
(base) jeffreybecca@jbeccaMBP qmk_firmware % make git-submodule
QMK Firmware 0.20.6
Synchronizing submodule url for 'lib/chibios'
Synchronizing submodule url for 'lib/chibios-contrib'
Synchronizing submodule url for 'lib/googletest'
Synchronizing submodule url for 'lib/lufa'
fatal: No url found for submodule path 'lib/lvgl' in .gitmodules
make: *** [git-submodule] Error 128

Any tips would be appreciated. I just want my keyboard to work again.

fauxpark commented 1 year ago

WARNING: Some git submodules are out of date or modified. Please consider running make git-submodule.

Make sure there are no modifications to the submodules with git status. If there are, remove those folders and try again.

jbecca commented 1 year ago

I was not able to get an old commit working with the QMK CLI tool, but I did (hopefully) fix my keyboard by doing a hard reset on my repo and adding the work around mentioned above to my config.h. If additional testing help is needed before a fix for this makes it to master, I am available.

limitedreality commented 10 months ago

Seeing several commits being pushed lately related to this bug that has one of my keyboards bricked. If it's ready to be tested, let me know and I'm happy to.