qmk / qmk_firmware

Open-source keyboard firmware for Atmel AVR and Arm USB families
https://qmk.fm
GNU General Public License v2.0
18.24k stars 39.3k forks source link

[Bug] Debug matrix print crashes with columns > 16 #7946

Closed zvecr closed 4 years ago

zvecr commented 4 years ago

Describe the Bug

With debug_matrix=true, on a f072 prototype with 17 columns, the first keypress outputs to hid_listen:

matrix scan frequency: 715

r/c 0123456789ABCDEF0123456789ABCDEF
00: 00000000000000000000000000000000
01: 00000000000000000000000000000000
02: 00000000000000000000000000000000
03: 00100000000000000000000000000000
30303031: 00000000000000000000000000000000
30303032: 00000000000000000000000000000000
30303033: 00000000000000000000000000000000
30303034: 00000000000000000000000000000000
30303035: 00000000000000000000000000000000
30303036: 00000000000000000000000000000000
30303037: 00000000000000000000000000000000
30303038: 00000000000000000000000000000000
30303039: 00000000000000000000000000000000
3030303A: 00000000000000000000000000000000
3030303B: 00000000000000000000000000000000
3030303C: 00000000000000000000000000000000
3030303D: 00000000000000000000000000000000
3030303E: 00000000000000000000000000000000
3030303F: 00000000000000000000000000000000
30303040: 00000000000000000000000000000000
30303041: 00000000000000000000000000000000
30303042: 00000000000000000000000000000000
30303043: 00000000000000000000000000000000
30303044: 00000000000000000000000000000000
30303045: 00000000000000000000000000000000
30303046: 00000000000000000000000000000000
30303047: 00000000000000000000000000000000
30303048:

Then no other key output happens, though lsusb still shows the device.

Removing 1 column, and the issue disappears.

System Information

Additional Context

There seems to be only 2 other boards in the repo that fit the criteria:

keyboards/candybar/config.h:#define MATRIX_COLS 17
keyboards/converter/siemens_tastatur/config.h:#define MATRIX_COLS 19

Only the candybar is f072, so I presume this is not a common enough setup for it to be noticed....

fauxpark commented 4 years ago

LTO?

EDIT: Tried the following with LTO both on and off, on a Proton C, prints as expected in the Toolbox.

bool process_record_user(uint16_t keycode, keyrecord_t *record) {
    if (record->event.pressed) {
        switch (keycode) {
            case KC_CUST:
                print_hex8(17);
                print(": ");
                print_bin_reverse32((uint32_t) 0x00000000);
                print("\n");
                return false;
        }
    }

    return true;
}
zvecr commented 4 years ago

LTO was off in my case.

Unless someone gets there first, I need to sit down with my box of hardware at the weekend to work out whats going on. f072 has some other gremlins around backlight, so I wouldnt be surprised if this issue is only reproducible on that chip.

wiml commented 4 years ago

Just ran into this myself. The problem is a fixed-size buffer in the printf() implementation.

What's happening: The print_bin_reverse16 and print_bin_reverse32 macros call xprintf("%016b", ...) (or %032b); this resolves to the printf implementation in tmk_core (not the one in ChibiOS, which doesn't even support the %b specifier); and since the output is longer than the buffer, it smashes the call stack.

The buffer is 12 bytes long, which is exactly enough for octal output of a 32-bit quantity (traditional printf doesn't have any bases smaller than octal), but when binary-formatted was added to this printf implementation, this became a bug.

In my case the visible result was a local variable in a calling function getting mysteriously mutated, but an outright crash is also likely. You can see that your loop counter is getting overwritten also — the 303030 prefix is ASCII 0 getting written into the high bytes.

zvecr commented 4 years ago

@wiml nice detective work! Looks like i missed that when i was last in there, as my issue was diagnosed with a ortho_4x12 board...