jpconstantineau / BlueMicro_BLE

Keyboard Firmware for the Nordic nRF52 Series of Bluetooth SoC based on the Adafruit NRF52 Feather
http://bluemicro.jpconstantineau.com/
BSD 3-Clause "New" or "Revised" License
362 stars 170 forks source link

blue_wizard compatibility broken #234

Open wizarddata opened 3 years ago

wizarddata commented 3 years ago

Describe the bug Compiling and flashing the blue_wizard keyboard firmware causes the keyboard to continually connect to & disconnect from PC. Board continues to broadcast between disconnect/connect. No errors are produced during compiling.

To Reproduce Steps to reproduce the behavior: Working with PR228. Also works with PR185, PR214.

Not working with PR229. Also not working with PR186.

Expected behavior Expected behavior is for the keyboard to successfully broadcast, connect to the PC, and send keycodes.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context

Keyboard uses nrf52832 MCU.

During compiling, arduino IDE reports 35% program space and 54% dynamic memory used.

Matrix on this keyboard is 15x7. This large size caused memory issues unless MAX_NO_LAYERS is reduced to 5.

It was also discovered that TG(Layer) only seems to function on layers 0 and 1. This may or may not be related.

wizarddata commented 3 years ago

I believe I've narrowed it down to between PR228 and PR229. I'm going to verify this in the morning.

wizarddata commented 3 years ago

I've been able to much better describe the behavior this morning with fresh eyes.

jpconstantineau commented 3 years ago

If the issue is related to connecting/disconnecting continuously, can you "forget" from your computer and try connecting again? This will make the two re-negociate the pairing keys.

wizarddata commented 3 years ago

Yes, I've been doing a forget and re-pairing as part of my process for testing each pull request in case that was the issue. Just did that again to confirm.

jpconstantineau commented 3 years ago

One last thing: Does it still happen if you reformat the nrf52 file system. Run this sketch: image and re-flash the firmware. This will wipe out any keys and settings saved to file on the nrf52 as well.

wizarddata commented 3 years ago

After running the pictured sketch and reflashing firmware built with PR229, the connection resets the same way.

jpconstantineau commented 3 years ago

Ok. What about with 228? I doubt that the changes brought by 229 will impact much, perhaps except for memory usage.

wizarddata commented 3 years ago

After the format, with pr228 the connection is persistent and the keyboard functions correctly.

It is possible that the device is resetting when the connection is lost and I'm misinterpreting the boadcasting as being continuous because the device is booting quickly. I'm using the nRFConnect app on my phone and the dispaly only refreshes about once a second.

jpconstantineau commented 3 years ago

That's good. let's see... Let's try this commit: ee8c80aa71b0291ae3d254ae7d3aec73dbbfd41c It's skips ahead a few commits I did to add a workflow on github actions and troubleshoot it. I expect that one to work the same... Let me know...

wizarddata commented 3 years ago

That commit does work correctly. Stepping to the next in line, https://github.com/jpconstantineau/BlueMicro_BLE/commit/f012b88b397e647c0ab3d404fd0e40968a253bfc brings back the disconnection behavior.

wizarddata commented 3 years ago

I've also flashed my board with the default 'Ergotravel' configuration. That produced the same results.

jpconstantineau commented 3 years ago

Ergotravel too? I wonder about the luddite or the 4x4Macropad. I have been looking too much at the 840 these days... I have a 832 macropad I have been using and that one is ok. I'll give a try to a few 832 boards I have around.

jpconstantineau commented 3 years ago

I assume you tried the one in between: 6f21844c84d66d10a07ff544f3c7d4dd85ca251c

wizarddata commented 3 years ago

My apologies, I'm jumping between the git software I use and the github web interface and I grabbed the wrong pr. The one you linked is the PR that brings about the problems.

jpconstantineau commented 3 years ago

6f has issues? and the one before didnt.

wizarddata commented 3 years ago

I've been experiencing some growing pains with the software I'm using, I was looking at the incorrect field. To summarize:

6f21844c84d66d10a07ff544f3c7d4dd85ca251c is working as intended f012b88b397e647c0ab3d404fd0e40968a253bfc bring about the connectivity / rebooting issues

jpconstantineau commented 3 years ago

That's what I was afraid. That's the commit I did all that clang reformatting. I was hoping a big feature to have brought up the issue but this commit was all about formatting. Needle in a haystack unfortunately.

wizarddata commented 3 years ago

using the 4x4 backpack firmware does result in a stable connection on f012b88b397e647c0ab3d404fd0e40968a253bfc

wizarddata commented 3 years ago

For now I'll just create a fork at an earlier commit and add a note to the project where to find it. Maybe we'll have an epiphany someday.

jpconstantineau commented 3 years ago

I just checked with my contra (4x12) on an 832 on the latest release (most up to date) and it compiled, flashed and connected fine. Issue is likely related to memory usage somewhere... Ergotravel memory footprint is much higher since it has to handle 2 BLE connections. Do you have a test keymap with a single layer? Might be worth giving that a try.

wizarddata commented 3 years ago

I'll create one and give it a try

jpconstantineau commented 3 years ago

Ok. Something as simple as the 4x4tutorial base keymap.

wizarddata commented 3 years ago

Bringing it down to one layer and reducing MAX_NO_LAYERS had the same behavior. Must be something to do with the matrix size? It's odd because the IDE reports plenty of space in memory.

jpconstantineau commented 3 years ago

Yes, it does report plenty of space but as the matrix is an array of array of keys, and each key contain two arrays of arrays, one for keycodes, the other for durations, each for different activations and layers. All these arrays are not vectors (dynamic size but static size). However, that's a whole lot of memory for your 7x15. I would think that these are allocated at compile time.

I wonder if it crashes in the setup or in the main loop. Perhaps we could add a simple LED turn on when we start the setup and turn it off when finishing the setup...

Since it's a custom board, do you have serial on board? I would like to check on memory space. There are a few commands I could setup to see the data...

jpconstantineau commented 3 years ago

If you have serial, you can get into the debug cli and send the i command on my contra, get this:


Name Addr 0x2000xxxx Usage
Stack 0xF800 - 0xFFFF 808 / 2048 (39%)
Heap 0x9A58 - 0xF7FF 22212 / 23976 (92%)
Bss 0x3600 - 0x9A57 25688
SD 0x0000 - 0x35FF 13824
__

That's not a lot of heap left...

wizarddata commented 3 years ago

The only serial I broke out was for the CP2104, I don't immediately know exactly what I can do with that but I'll dig into it and see what I've got.

wizarddata commented 3 years ago

While I've got the serial monitor up, I get the nice bluemicro ascii art, but the board periodically resets and I'm not able to issue any commands.

wizarddata commented 3 years ago

This is on the working firmware.

wizarddata commented 3 years ago

Here we go, I timed it between resets.

Device ID : 54797D3ECDAB831B

MCU Variant: nRF52832 0x41414530 Memory : Flash = 512 KB, RAM = 64 KB Keyboard Name : Blue Wizard Keyboard Model : Blue Wizard Keyboard Mfg : awells

Device Power : 0.000000 Filter RSSI : -90 Type RSSI name cent 0
prph -61 DESKTOP-9ILB3J0 cccd 0

BSP Library : 0.21.0 Bootloader : s132 6.1.1 Serial No : CDAB831B54797D3E


Name Addr 0x2000xxxx Usage
Stack 0xF800 - 0xFFFF 736 / 2048 (35%)
Heap 0xA638 - 0xF7FF 18016 / 20936 (86%)
Bss 0x3600 - 0xA637 28728
SD 0x0000 - 0x35FF 13824
__

Task State Prio StackLeft Num

loop X 1 658 1 IDLE R 0 26 3 Tmr Svc B 2 144 4 BLE B 3 1038 5 Callbac B 2 687 2 SOC B 3 163 6

jpconstantineau commented 3 years ago

It resets when connected and before you have a chance to send a command? This means it's in the loop, setup as passed.

jpconstantineau commented 3 years ago

Are you still on the adafruit bsp? That could perhaps make a difference... can you switch over to the Community BSP?

jpconstantineau commented 3 years ago

http://bluemicro.jpconstantineau.com/docs/tools Second url you need to add to the preferences and you can download the community BSP.

wizarddata commented 3 years ago

That did change behavior, but unfortunately it only resets faster. The community BSP seems to bring the connectivity problem back to 6f21844 as well.

wizarddata commented 3 years ago

There may be a problem with how I'm set up, I'm going to revisit that again in the morning.

jpconstantineau commented 3 years ago

I was able to replicate your issue with the latest on my contra (with the 7x15 config you have). It might not be a keyboard properly setup but at least I see the reboot issue you have. I'll see if I can turn off things in a separate branch and make it work.

jpconstantineau commented 3 years ago

I did turn off the scanning timer and that resolved thee reboot issue. I'll see if the timer task has enough memory.

jpconstantineau commented 3 years ago

I may have found something... Really not what I thinking... It seems that it doesn't even make it to the end of setup before it crashes. As such, I'll re-arrange a few things in there to see if that helps. I'll send you a branch you can test with...

jpconstantineau commented 3 years ago

Didn't really find anything. thought I had identified the problem area but no success. I'll have to start removing stuff until it starts working. The frequency at which it reboots isn't really indicative of the severity of the problem... I moved all tasks in the same loop and the only thing that did is change the frequency of the crashes. Sometimes moving a couple of things around only moves the last line it runs before it crashes. I still suspect it's memory related but hunting it down will be the challenge...

wizarddata commented 3 years ago

I'll continue to spend time working with it, I'll update as soon as I find anything useful.

jpconstantineau commented 3 years ago

I have been slowly going through the commits and looking at memory usage. I was at 99% heap usage with Max layers was at 5 and that made the board crash fetching memory usage data. I brought it down to 4, and that helped bring heap usage down to 90%. It's definitely memory related. I'll have to do some research on c++ heap management...

jpconstantineau commented 3 years ago

Have a try at the nrf52832-revert branch. I reverted the problematic commit and changed the number of layers from 5 to 4. This appears to work ok here but really need to have it tested on your board.

wizarddata commented 3 years ago

Thanks for that, I'll give it a go. I've somehow got both of my test boards to a state where they crash-loop regardless of the firmware version I use, so I need to spend a minute to sort out how that happened.

wizarddata commented 3 years ago

Initial tests show that repo to be working correctly with MAX_NO_LAYERS set to 5. I'm going to revisit in the morning to ensure I haven't got something wrong on my end to give a false positive.

wizarddata commented 3 years ago

After a fullerase and flash, both my test boards are working correctly on nrd52832-revert. The board will still enter a crash loop if MAX_NO_LAYERS isn't set to one larger than the number of layers implimented, but that doesn't seem like a real issue. Did the difference end up being just the clang format or did you have to make other changes?

jpconstantineau commented 3 years ago

As far as differences, I didn't try to scan and sort out clang-format vs other changes. With so many file changes in that commit, I'll need to see if something else made it; however, I don't think so. It does worry me that a format change caused the issue; not something one would expect.

wizarddata commented 3 years ago

Just in case this helps narrow down the problem, the board will still occasionally crash when it's been asleep for more than 10 or 15 minutes. A reset solves the problem. This is with four layers and MAX_NO_LAYERS set to 5. I'm going to trim that down to 3 and see if that changes.

wizarddata commented 3 years ago

After further testing, I am still having an issue where, occasionally, after working with the nrf52832-revert branch I'll get stuck in a way that older bulids will crash in the same way as the newer ones until I perform a fullerase and reflash.

jpconstantineau commented 3 years ago

It gets stuck in a bootloop? Is that with number of layers at 4? or back to 5? I'll be doing a full clang-format from the previous commit and compare with the problematic one to see if something else was added that causes the main issue.
I assume you don't use combos. I'll add some logic to not include these (will free some space) and I'll probably refactor key.cpp to use a struct instead of 2 pairs. That should help free some more memory.

wizarddata commented 3 years ago

It has occurred with MAX_NO_LAYERS set to both 4 and 5. I wish I could describe the symptoms more concisely but the behavior has been inconsistent enough that I'm having trouble nailing it down.

I don't use combos myself, but I can't necessarily speak for the users of the other half dozen of there that are out there so far. At the end of the day though, I think most reasonable would agree that stable operation would take priority over other features like combos.

jpconstantineau commented 3 years ago

I agree that stability is more critical than features that not everyone uses. On nrf52840s, the amount of RAM is so much larger that pretty much everything will fit. I'll probably be putting back my 832 on my luddite and see how much ram i have left and see if it's stable or not.

With the max layers set at 5, I saw 99% heap usage. That's way too high for comfort...