qmk / qmk_firmware

Open-source keyboard firmware for Atmel AVR and Arm USB families
https://qmk.fm
GNU General Public License v2.0
17.9k stars 38.45k forks source link

raw_hid intermittently crashes when connected directly to USB, but doesn't when connected through a hub? #16757

Closed SteffanDonal closed 1 year ago

SteffanDonal commented 2 years ago

Hey there!

I'm implementing a computer-controlled RGB matrix custom effect, essentially a music visualiser. I'm using the Canvas CTRL keyboard by Massdrop, and for the most part, implementing this has been smooth.

I'm now pretty close to publishing my work along with the companion application that handles audio processing and communication with the keyboard and want to make sure it's robust - and I've been running into an intermittent crash.

My changes are published on my fork: https://github.com/SteffanDonal/qmk_firmware/tree/master/keyboards/massdrop/ctrl/keymaps/ruirize (as a side-note, if anyone is interested in trying out the visualizer for themselves and/or helping me debug, contact me on Discord Ruu#9999 and I'll share the companion application that currently supports only Windows)

So my current theory is that even though I'm now using a USB port that can keep up with the power demand, the spikiness of the power draw causes some kind of brownout that GCR is unable to catch/protect against.
I would not write off my motherboard/USB controller/power supply as possible causes, but I don't know of many easy ways to practically debug this.

At this point, I'm trying to check if my own code is crashing when handling the HID messages, by validating the message length and setting the keyboard to display red when a malformed message is detected. So far though, I've not learned anything new.
I'm also going to run tests where I lower the max brightness by one step at a time to see if that impacts the behaviour at all.
I'll also try other USB ports and see if that changes anything.

Really all I'm trying to work out is if this is a problem with my PC / motherboard / USB controllers or if it's a genuine code crash that I can fix.

Any ideas/advice?

SteffanDonal commented 2 years ago

Update: The keyboard still crashes even when the LED brightness is set to minimum, so I think that confidently rules out power consumption/delivery issues. I think RGB_TOG probably disables matrix animation entirely when no LEDs are enabled, so that would be why I wasn't getting a crash even after many hours initially.

The messages received by the keyboard are never malformed as far as I can tell with my debug code, so I think that now points towards an error in my visualizer's math: https://github.com/SteffanDonal/qmk_firmware/blob/master/keyboards/massdrop/ctrl/keymaps/ruirize/visualizer.h#L36

I think I'm going to start very slowly commenting out parts of it to determine what causes it to lock up.

SteffanDonal commented 2 years ago

Further update: I suspect it's a combination of issues. When the visualiser's math is simplified, the crash does not happen as frequently, but it does happen.

I'm currently thinking that it's likely to be an issue with raw HID message handling, perhaps a timing/threading issue in the keyboard itself that causes a crash...

SteffanDonal commented 2 years ago

Okay, update time!

So, running out of ideas, I tried plugging my keyboard into my PC through an unpowered, no-name USB 3.0 hub, and it's no longer crashing, no matter how long I'm sending HID messages. Absolutely bizarre.

When I have the keyboard plugged directly into any USB port:

It never crashes when I have the keyboard plugged in through this hub.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 90 days. It will be closed in the next 30 days unless it is tagged properly or other activity occurs. For maintainers: Please label with bug, in progress, on hold, discussion or to do to prevent the issue from being re-flagged.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has not had activity in the last 30 days. If this issue is still valid, re-open the issue and let us know. // [stale-action-closed]