qmk / qmk_firmware

Open-source keyboard firmware for Atmel AVR and Arm USB families
https://qmk.fm
GNU General Public License v2.0
18k stars 38.69k forks source link

[Bug] Looks like 0.11.0 broke something for jj40 #11389

Open Artheg opened 3 years ago

Artheg commented 3 years ago

Hi,

I've wanted to alter the keymap for my jj40 board. When I flashed the firmware, keys started to behave strange. Sometimes there was no output at all, sometimes they would 'stick' (e.g. I type 'g' and the output goes 'gggggg...' until I hit 'g' again) and sometimes there would be a delay after I hit the key. Backlight Underglow wouldn't work at all. I've tried to compile the firmware both locally and on the website and I've had the same effect.

With the help of guys in discord (spidey3, Dasky) it was figured out that the last firmware that was working was compiled with 0.10.54. Everything after that just didn't work.

spidey3 commented 3 years ago

I believe that @zvecr planned to take a look at this...

Artheg commented 3 years ago

I'm on the latest firmware now (0.11.53). Looks like 'sticking' and delay are gone. Backlight Underglow still doesn't work.

spidey3 commented 3 years ago

Can you describe the backlight issue in more detail? Do the underglow LEDs work? Does Raise+Lower+S or Raise+Lower+D change anything? What about Raise+Lower+X or Raise+Lower+C?

Artheg commented 3 years ago

I'm sorry, I've used wrong words here. What I meant by backlight was RGB Lighting (underglow). I've tried using combinations you suggested (default keymap), but unfortunately nothing happens.

spidey3 commented 3 years ago

Recapping: OP does not have backlight LEDs installed. The remaining issue is to diagnose the difficulty enabling the RGB Lighting (underglow).

benthepoet commented 3 years ago

I can confirm I'm having this issue with the 0.11.53 release. After flashing the board feels sluggish often not responding to several key presses and will start endlessly repeating a character if you roll your fingers along the top row quickly. Flashing with 0.10.54 as @Artheg mentioned, the board works fine (my underglow works also).

dmesg in Linux shows some weird reset low-speed USB messages. These don't show up when using 0.10.54.

[11025.087805] usb 5-1: New USB device found, idVendor=4b50, idProduct=0040, bcdDevice= 2.00
[11025.087815] usb 5-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[11025.087820] usb 5-1: Product: JJ40
[11025.087824] usb 5-1: Manufacturer: KPrepublic
[11025.117297] input: KPrepublic JJ40 as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.0/0003:4B50:0040.0013/input/input49
[11025.120741] usb 5-1: ctrl urb status -62 received
[11025.173358] hid-generic 0003:4B50:0040.0013: input,hidraw0: USB HID v1.01 Keyboard [KPrepublic JJ40] on usb-0000:00:13.0-1/input0
[11025.189945] input: KPrepublic JJ40 System Control as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.1/0003:4B50:0040.0014/input/input50
[11025.246311] input: KPrepublic JJ40 Consumer Control as /devices/pci0000:00/0000:00:13.0/usb5/5-1/5-1:1.1/0003:4B50:0040.0014/input/input51
[11025.246559] hid-generic 0003:4B50:0040.0014: input,hidraw1: USB HID v1.01 Device [KPrepublic JJ40] on usb-0000:00:13.0-1/input1
[11026.692777] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11028.906366] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11031.116836] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11033.336801] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11035.200201] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11037.367141] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11038.073905] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11039.960724] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11040.647408] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11042.430964] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11044.287781] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11046.401295] usb 5-1: reset low-speed USB device number 8 using ohci-pci
[11048.441467] usb 5-1: reset low-speed USB device number 8 using ohci-pci
Artheg commented 3 years ago

@benthepoet Are you sure that you're having 'sticking' issues with 0.11.53? I'm on 0.11.53 with my JJ40 and sticking is gone, although underglow is not working.

Also, my underglow doesn't work at all. I've tried to flash my board with all versions from 0.11.0 to 0.10.54 and since then it just wouldn't work. All layers are working properly. I've tried to reset EEPROM. I've also tried to map RGB keys to the layer 0. Is my underglow broken for good?

benthepoet commented 3 years ago

Yep, I'm definitely getting the sticking/delay issues with 0.11.53. Cleared EEPROM and flashed several times with the same result.

One strange thing is that if I plug the keyboard into a USB hub then the sticking/delay issues don't occur but if I plug the board directly into the USB ports on my motherboard that's when it starts acting up. Using 0.10.54 I can plug directly into my motherboard without any issues.

Anytime I've flashed the board my underglow has been working, usually defaults to solid red when I use the default layout.

spidey3 commented 3 years ago

One strange thing is that if I plug the keyboard into a USB hub then the sticking/delay issues don't occur but if I plug the board directly into the USB ports on my motherboard that's when it starts acting up. Using 0.10.54 I can plug directly into my motherboard without any issues.

This could be a USB device state issue. Sometimes hubs behave differently. I wonder - can you try the following versions:

I'd like to figure out the version where the problem started...

benthepoet commented 3 years ago

I did some testing and it looks like the problems start with 0.11.0. The board only registers keypresses for a little bit and then it becomes almost completely unresponsive.

tzarc commented 3 years ago

Reproduced this with my Mysterium and 0.11.57. Left it sitting, connected directly to onboard USB. Connected through a hub wasn't triggering.

tzarc commented 3 years ago

Seems like I've come to the same conclusion -- 0.11.0 is broken, 0.10.54 is not.

MajorKoos commented 3 years ago

I've got the same issue with a port I'm working on for leeku pcbs (atmega32a + bootloadhid) Type a single "." and I get a row of "............................" and then it hangs or sometimes even disconnects. 10.54 is stable, but 11.x is not.

sowbug commented 3 years ago

I'm experiencing similar behavior on a custom keyboard (sowbug/68keys). This is a Blue Pill board. Problems began when I rebased to master around February 10 and flashed the resulting firmware after problem-free usage of firmware built around March 2020. Note that the issue seems to happen only on a Google Pixelbook, but not on my personal Linux machine. I had assumed it was thus a problem with a recent Chrome OS update, but I'm now starting to suspect the QMK update. I'll try rolling back to last year's firmware and see if the issue goes away.

sowbug commented 3 years ago

I updated to yesterday's master, rebuilt, and flashed. So far I haven't seen the problem again.

jhbruhn commented 3 years ago

I am having the same problem with a Discipad (atmega328p as well) and the most recent master. The problem is also present in 0.11.0, but not in 0.10.54. The only relevant change I could find was in tmk_core/protocol/vusb/main.c running the housekeeping tasks, but reverting that did not fix it. Interestingly enough, it works on a USB3 port, but not via a USB2.0 hub, while my ATMega32 Discipline works on that same hub.

spidey3 commented 3 years ago

Is this still happening in 0.12.0 and later?

jhbruhn commented 3 years ago

Yes,. I tested with the latest master and a couple of versions between 0.10.54 and now.

jhbruhn commented 3 years ago

I assume the underlying problem is some kind of performance regression, possibly also the avr-gcc compiler version I am using? Using 0.12.15, I set the USB_POLLING_INTERVAL_MS to 40 instead of the default 10 and the "reset low-speed USB" message count was heavily reduced (0 at the moment). I additionally set OPT to 2 instead of s because the ATMega328 seems to have enough memory.

daskygit commented 3 years ago

I was helping someone on discord with this issue using a sesame and a thinkpad dock, version 0.10.54 worked fine.

jeffjewiss commented 3 years ago

I was having the above issue of repeated keys or my keyboard locking up with my Discipline65.

A couple days ago I pulled down master and reflashed the keyboard and I haven't had repeated characters or a lockup since (fingers crossed). I think this PR is what contained the fix: https://github.com/qmk/qmk_firmware/pull/12576

fauxpark commented 3 years ago

@jeffjewiss definitely not, the Discipline65 (and the JJ40 as well) runs an ATmega32A, so QMK will use V-USB rather than ChibiOS or LUFA. I would be interested to know what it was, though - I don't think there's been any meaningful changes to the V-USB code recently.

jeffjewiss commented 3 years ago

@fauxpark fair enough, thanks for explaining. I just guessed at what the fix might have been from looking at the commits over the last 2-3 weeks.

My discipline went from being basically unusable since it would lockup or repeat characters multiple times an hour to not having any issues at all. The only thing I did was pull qmk master and reflash.

I could try to git bisect to find what commit added the fix, but otherwise I'm not familiar with the internals of qmk at all.

MajorKoos commented 3 years ago

I can't find the specific PR at the moment, but when I did some digging I saw one of the PR's related to adding the extra USB endpoint mentioned how it added some additional latency but should still be in spec for devices slower than 16Mhz. Wonder if that's it since some folks have mentioned that dropping the polling interval helps.

kosmiciatakuja commented 3 years ago

I'm having the exact same issue with Keebio Nyquist with ATmega32u4 on board. Also it's a split keyboard and strangely after flashing firmware the slave half is not functional for about 10-20 seconds and slowly starts to register keys (at first it's like 1 registered key for 10 pressed and then it gets faster and faster). I'm on the latest master (f7c6d68b3443e242cd658e2913d16db8b2318e03). I only managed to observe that switching some of NO_USB_STARTUP_CHECK, AUTO_SHIFT_ENABLE, CONSOLE_ENABLE, LTO_ENABLE or COMMAND_ENABLE seems to influence how fast will the keyboard start exhibiting problems after reset. But they appear sooner or later. I'm going to try flashing 10.54 tonight. But for sure the problem is present in the latest FW, at least on Nyquist.

Edit: sorry, I described the problem with the slave half taking some time to become functional but it's tangential. The main problem is the same as described in this issue, key repeats and freezes.

MajorKoos commented 3 years ago

This is the PR I was thinking of: https://github.com/qmk/v-usb/pull/1

Could it be that some 12 MHz devices don't work well with the change?

jhbruhn commented 3 years ago

Right now, the current Version (84883d340045c50ce6c200c9087461c1db853898) works for my Discipline (ATmega32), but not for my Discipad (ATmega328p), which is weird in my opinion. I even turned of the third endpoint by setting MOUSEKEY_ENABLE = no, but that didn't improve it. Increasing the USB POLLING INTERVAL to 50ms makes it somewhat useable.

kb-elmo commented 3 years ago

This is the PR I was thinking of: qmk/v-usb#1

Could it be that some 12 MHz devices don't work well with the change?

I'm having the same issue on a Sesame keyboard which is running at 16Mhz. And so does the Discipline.

And the board worked fine for quite some time after the third V-USB endpoint was added. So it's unlikely that this caused the issue. I only recently started to have this problem on my board so it has to be a more recent change.

drinckes commented 3 years ago

I have a jj50 board, and just flashed it this morning (using the online configurator).

In my case it works perfectly when directly connected, but through a powered hub (dell monitor) it repeats and drops keys:

Unfortunately I can't see any indication of what version the configurator uses?

fauxpark commented 3 years ago

The Configurator uses latest master.

If someone could capture this happening in Wireshark, that could potentially be very helpful.

kosmiciatakuja commented 3 years ago

@fauxpark is there any guide or at least pointers on how to capture this with wireshark? I'm still having the same problems with my keyboard on the latest master as of today. I don't want to be stuck on 10.54 forever :) I have some wireshark experience but only with capturing network packets, so I'm a bit stumped when it comes to capturing USB traffic...

fauxpark commented 3 years ago

@kosmiciatakuja on Windows you can install USBPcap: https://desowin.org/usbpcap/ For macOS you have to disable SIP, then manually enable the USB interface eg. sudo ifconfig XHC20 up: https://developer.apple.com/forums/thread/95380

kosmiciatakuja commented 3 years ago

I managed to capture some traffic on 10.54, this just shows the keyboard working. I also captured a period of time on the latest version, 13.34. I'm attaching both pcap dumps. The weird thing is that I had to leave the keyboard for a while and when I got back it wasn't working at all (completely stalled). I had to unplug it and plug it again to bring it back. It was all during the wireshark dump so hopefully something got there... I can't identify the device so I had to capture on devmon0 which just monitors all devices, which means that there's mouse traffic. As far as I could check the keyboard is device 2, so it can be filtered easily. There's no capture filter in wireshark for USB as far as I could check. 13.34.pcapng.gz 10.54.pcapng.gz

Edit: This is on keebio nyquist. And I think it may be important that the keyboard just hangs with time even without touching any keys.

kosmiciatakuja commented 3 years ago

I started bisecting the repo for the problem and so far I managed to initially focus on the transition between 0.11.69 and 0.12.0. Everything seems to work on 11.69 and 12.0 causes the keyboard to lock up with time. I'm still testing this and if it is indeed that I'm going to test every commit between 11.69 and 12.0 to see what causes the problem, fortunately there's only a few of them. Once I find out it should be possible to reverse-patch it onto the current version (if nothing else is changed in that area). I'll report back soon.

kosmiciatakuja commented 3 years ago

I managed to track this down to this specific commit. When I burn the previous commit (804d5c1c5d) it works but with this one (1581ea48dc) compiled and burned the keyboard hangs within maybe 30 minutes and must be replugged to work. The only thing I don't understand is that this commit only changes a bunch of *.py files I believe responsible for the CLI, no .c files with any serious code in them. Which is strange and I'm not sure how to proceed because of that.

sigprof commented 3 years ago

The commits 804d5c1c5d59d9a12c1d793289ccbd59cb650ec2 and 1581ea48dcd48d0d3f42cc09b388c468aedec45d are not consecutive, however — there are 645 commits between them (or 144, if you ignore merge commits). Maybe you need to do another bisect round just between these points, or you just pasted a wrong commit ID.

kosmiciatakuja commented 3 years ago

I'm confused then, this is how it looks in my git log:

git log screenshot

I circled both commits in yellow. Both are from February 27 so they should be close, I guess. It may make a difference that one is on master and the other in on develop, but as seen on the screenshot the second one (from devel) was merged to master just one commit later, so there shouldn't be much differences...

sigprof commented 3 years ago

This graph is somewhat misleading — the commit 804d5c1c5d59d9a12c1d793289ccbd59cb650ec2 was made in the master branch just before the February 27 breaking changes merge, while the commit 1581ea48dcd48d0d3f42cc09b388c468aedec45d was made in the develop branch, again just before that merge. So the difference between those commits is basically the whole content of the develop branch that accumulated over ≈3 months since the previous breaking changes round, and your result basically says “something that was added in the February 27 breaking changes merge broke things”, which is actually a lot of code.

You probably should use the commit 3cc7d22732e201d5fd83931e5cfee21f83fd2352 as the base instead — it is the point where that incarnation of the develop branch was forked from master. The history from 3cc7d22732e201d5fd83931e5cfee21f83fd2352 to 1581ea48dcd48d0d3f42cc09b388c468aedec45d contains both commits to develop and merges from master.

Also be sure to run make git-submodule before every compile — the develop branch contains some changes to submodules, and if you miss updating them, you won't be testing the correct code. (Although this particular commit range seems to have only chibios-related changes, which won't affect your board.)

kosmiciatakuja commented 2 years ago

Okay, understood, I think (about the commits). For me it would be simplest to just stick to one branch (master) and go commit by commit there. In that way, the commit for 12.0 breaks everything. But I have a slight breakthrough in this case. Since my keyboard is a split one (Keebio Nyquist) it has two USB ports, on the left and right hand. I'd been using the left port since the beginning as it is closer. Now I switched to the right port and poof - all my problems are gone. Keyboard works for over a day now on a later version and no hangs, absolutely 100% good behavior. I tried this same version with the left hand port and it hangs as usual. I just need to burn qmk again after switching ports. Can anybody confirm that a) the problems they were experiencing were on split keyboards, and b) if they are gone after switching USB ports to the other keyboard half?

sigprof commented 2 years ago

Hmm, looks like we are discussing your problem with Keebio Nyquist in a wrong place then (and I did not notice that you are writing about a completely different keyboard). This issue is about problems with jj40, which is a V-USB board; you are experiencing problems with a board based on ATmega32U4 with native USB interface, therefore your problem is probably caused by something completely different.

Please open a separate issue about your problem.

And “the commit for 12.0 breaks everything” is unfortunately not very useful — running a bisect over the develop changes in that range could pinpoint a single problematic commit, however, which would be really appreciated. Although your findings that the problem is linked with using a specific half as master may also mean that you have some hardware issues with one of the halves (assuming that you always reflashed both halves with the same firmware when testing).

sweetsuicide commented 2 years ago

Hi, I tried flashing my jj40 using qmk toolbox 0.1.1 and modified and compiled my firmware in the web page (I have no idea what version it is). I have the very same issue as the one described in the first ticket. I am available to help analyse the issue

ollien commented 2 years ago

I'm still experiencing this with a freshly built Sesame keyboard. I managed to make it work with firmware 0.10.54.

Anyway - I flashed at what's currently at master (c03e18f728a8c56bbe49d2c319ae96decc3e48bb) and experienced this problem. I also collected a capture of me pressing "p" and "o", with "p" eventually getting stuck. If I'm honest, I'm not experienced enough to know what I'm looking at here, but hopefully I was able to isolate it enough.

pcap.zip

I'm going to try my hand at bisecting this and will report back if I find anything.

ollien commented 2 years ago

I found it, I think! 75a18e69f9d3b6dfa470d0a7dbd78408d6a1c496 breaks in exactly the way described, but 69d8bbf1f4620bbde6abc552efa748324aec9b91 (its parent) works totally fine. I unfortunately do not have the experience necessary to see a problem (nor to know if any of the intermediate commits in the original pull request, #10491, are safe to flash without harming my keyboard).

Please let me know if there's anything else I can provide to help track down this problem.

jhbruhn commented 2 years ago

Nice work! My guess is: The timing of the vusb implementation is broken through these new atomics disabling interrupts, which leads to the USB endpoint failing.

I do not know how critical these are, but can they by disabled by doing a #define IGNORE_ATOMIC_BLOCK?

ollien commented 2 years ago

@jhbruhn Yep - I applied this patch to 75a18e69f9d3b6dfa470d0a7dbd78408d6a1c496 and I'm typing this comment on it now...

diff --git a/quantum/quantum.h b/quantum/quantum.h
index 42e8c00091..c1320d2645 100644
--- a/quantum/quantum.h
+++ b/quantum/quantum.h
@@ -220,6 +220,7 @@ typedef ioline_t pin_t;
 #    define togglePin(pin) palToggleLine(pin)
 #endif

+#define IGNORE_ATOMIC_BLOCK
 // Atomic macro to help make GPIO and other controls atomic.
 #ifdef IGNORE_ATOMIC_BLOCK
 /* do nothing atomic macro */
jhbruhn commented 2 years ago

I currently can't test this myself, but as this only seems to happen for ATMEGA32 based keyboard (?), can we do a patch which disables the atomics implementation for that processor? Or maybe even only for the specific keyboards in the associated config.h?

ollien commented 2 years ago

@jhbruhn I guess you could, but that would re-introdce the RGB bugs that the original PR aimed to fix. That said, maybe by luck none of the boards affected here were affected by that? I'd have to dig a bit to answer that

fauxpark commented 2 years ago

I am not seeing this issue with either my JJ4x4 (32a) or my Plaid-Pad (328p) - but those are both 4x4 macropads, so perhaps it has something to do with matrix size (ie. larger matrix takes more time to process atomically).

ollien commented 2 years ago

It seems that a couple of keyboards seem to actually already disable atomics as a workaround for matrix delay (see: https://github.com/qmk/qmk_firmware/blob/master/keyboards/massdrop/alt/config.h#L43-L44 which may very well be due to this issue, but it doesn't use an ATMEGA32). I've opened a PR for the Sesame, which is the only affected keyboard I can test. It doesn't have LEDs, so it isn't affected by the issue that https://github.com/qmk/qmk_firmware/pull/10491 was addressing.

zvecr commented 2 years ago

The Drop boards do this for a different reason. Mainly that waitInputPinDelay is not implemented, and there is no benefit to having the matrix interactions be atomic, where its inclusion throws off the expected timings.

Setting IGNORE_ATOMIC_BLOCK has nothing to do with running the keyboard "without matrix delay".