TOTP hotfix: reduce memory usage

matheusmoreira commented 6 months ago

The TOTP face is working in the simulator but fails on the real hardware when loaded with lots of codes, just like the LFS version. This is likely caused by the recent refactoring of the TOTP face which introduced a declarative credential interface for ease of use. That's accomplished by decoding the secrets at runtime which increases the RAM requirements. Users are likely hitting memory limits.

In order to mitigate this, the algorithm is changed from decoding all of the secrets only once during initialization to on the fly decoding of the secret for the current TOTP credential only. This converts this face's dynamic memory usage from O(N) to O(1) at the cost of memory management when switching faces and credentials which could impact power consumption.

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

Also took this opportunity to restructure the code a bit. Also added code to check for memory allocation failure.

maxz commented 6 months ago

Due to variable key sizes, the memory cannot be statically allocated. Perhaps there's a maximum key size that can serve as worst case?

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

I'll repeat my comment from #384 here: Unless you find another solution to the possible repeated runtime allocation (on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset) and decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead. Frankly I'd prefer a more complicated setup over increased battery usage. I will at least maintain my own patch for my watch if this behaviour remains.

@joeycastillo also spoke out against runtime heap allocation within faces in the past (But I can't find the PR right now, it was some PR by @theAlexes so they might remember which one it was.) In that particular case I think he accepted the allocation in the face's initialisier.

theAlexes commented 6 months ago

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

madhogs commented 6 months ago

I've tested this now on the sensor watch lite and it fixes the issue completely. I have checked the values too with my real codes and they are correct. To check the extreme I also tested a large number (100+) of 128 character secrets and that also works without any issues on the real watch.

To give my 2 cents on the above conversation: I think this new method of configuring the secrets is a great improvement on having to manually enter the byte array, it is also simpler and less error prone for new users to configure their totp codes for the watch. As the physical watch can handle this code ram wise even for a large number of very large secrets I don't personally see why that would be a reason to go back to the old implementation. In terms of power consumption, i don't know how much more this would use but I also would be surprised if it had any effect in the real world as users are unlikely to spend any significant amount of time cycling through the codes day to day (after setting up the watch of course).

All in all, unless there is something I fail to grasp I personally would prefer to keep this new implementation of managing credentials. If it is decided to go back to entering the raw bytes, I hope there is a way to keep an option for the users of adding the secrets directly, maybe through making similar changes to the lfs face? or having this as an optional configuration or a new totp face.

matheusmoreira commented 6 months ago

There is not. I looked at that before. There is no limit in the standard. The biggest in-the-wild key I could find was 128 bytes.

That's unfortunate. We could set up a maximum size that's overridable by preprocessor constant. That way we can have 128 bytes as a worst case and the user can override it if it turns out to not be enough, or if they know they won't need all those bytes.

This would completely eliminate the dynamic memory allocation at the cost of wasting something like ~90 bytes in the common case and maybe forcing the user to decide the maximum size of the variable, something that's likely to generate support requests in the discord.

Honestly this is something the compiler should be figuring out on its own. The C prepocessor is too limited...

on a quick glance it at least does not seem to call malloc for the same entry twice but to use memset

It uses malloc and memset once during watch initialization to allocate and zero the TOTP state structure. This memory is not freed and has essentially unlimited lifetime.

Then it uses malloc and free in the following patterns:

Face enters the foreground
- malloc(current_decoded_key)
Face leaves the foreground
- free(current_decoded_key)
User switches TOTP code
- free(current_decoded_key)
- current++
- malloc(current_decoded_key)

decoding I guess I would prefer for the solution in the main branch to be the one from my PR #356 which has to deal with the explicit key array but has very little runtime overhead.

Me too. The very next thing I'll do is work on getting the script you made merged and integrated into the build system. It's an excellent solution which has the added advantage of not requiring the user to edit the source code.

Frankly I'd prefer a more complicated setup over increased battery usage.

Me too... But we are programmers. Users might not be comfortable with munging bytes and editing them into source code.

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I will at least maintain my own patch for my watch if this behaviour remains.

We should be able to resolve it soon. The static allocation approach is viable, and your Python script is the best solution.

@joeycastillo also spoke out against runtime heap allocation within faces In that particular case I think he accepted the allocation in the face's initialisier.

Yes, in the discord:

fwiw I think it’s a best practice to avoid dynamic memory allocation outside of the watch face setup function. As it stands now (at least in the case where the watch is on-wrist and not plugged into USB) memory is only allocated once, at boot, and that doesn’t change until the battery dies. this feels good for the stability of the thing when it has to run for, y’know, two years

I agree with him and I will work to eliminate these dynamic allocations.

matheusmoreira commented 6 months ago

@theAlexes

yeah, given how resource-constrained the RAM on this chip is, i think we should revert to the implementation with a static key array.

I'm confident I can remove the need for those allocations by allocating a static 128 byte array and making it user overridable. I'll push a new commit soon.

matheusmoreira commented 6 months ago

Dynamic memory allocation has been removed. Now there's a single 128 byte buffer that's allocated on face setup. It's also compile time overridable by the user. I refrained from using a static buffer because movement supports multiple instances of the same face and the static approach fails to be reeentrant.

The key lengths are checked on initialization and if any key is found to be too large to fit the buffer it is turned off and ERROR is displayed on the watch instead of the code. The base32 encoded secrets are decoded dynamically to the buffer at the following times:

Face enters the foreground
User switches TOTP code

Therefore, there is still some extra runtime overhead that can still be eliminated via the use of generator scripts. In the future, I will make it so there is no overhead at all.

@madhogs I would really appreciate it if you could test this new version on your watch too! I'll make sure you are credited in the commit message just like before!

madhogs commented 6 months ago

@matheusmoreira I've checked the latest code on my watch, no issues, all is working. Checked values of real codes again and the 100+ large secrets, both are fine. :+1:

The watch face's documentation also suggests using a web service to decode the bytes which could leak the secret. I really wanted to avoid the need for that.

I was really surprised by this too and think its a very bad thing to be suggesting users to do!

matheusmoreira commented 6 months ago

Great. Everything seems in order for the hotfix to be merged then.

matheusmoreira commented 6 months ago

@madhogs Are you available for some additional hardware testing?

I just pushed some additional bug fixes. There were some bugs in the error handling. Now the face should fail gracefully with an error message even if all keys are > 128 bytes. Can you please test that failure case?

madhogs commented 6 months ago

Just did the same checks as before on the physical watch and all working still :+1: Also added several very long keys (>> 128 characters) and it displays the word error as i assume is intended.

matheusmoreira commented 6 months ago

Thank you!!

@theAlexes The pull request is ready for your review!

madhogs commented 6 months ago

Just to update i've been using this on my physical watch and using the codes for almost a week now and no issues :smile:

theAlexes commented 6 months ago

Alright, going to merge this in :)

joeycastillo / Sensor-Watch

TOTP hotfix: reduce memory usage #385