nthdimtech / signet-base

Signet firmware and device interface library
https://www.crowdsupply.com/nth-dimension/signet
GNU General Public License v3.0
15 stars 7 forks source link

Flash memory wear protection #1

Closed dumblob closed 4 years ago

dumblob commented 6 years ago

I couldn't find any wear protection of the flash memory.

  1. How do you make sure the returned data (passwords) are correct?

  2. What do you do with passwords in case any checksum doesn't match?

nthdimtech commented 6 years ago

There is no CRC check on entries yet but this will be added soon. I'll add a separate issue for it. If a CRC check fails then the client would be notified and the an alert generated in the client. Once flash sectors start failing it's probably best to treat it as the end of life of the device and recommend replacement as other sectors are probably liable to fail soon.

I don't plan on developing an explicit wear leveling scheme as I think the software complexity would create more risk than it mitigates. I will limit wear indirectly by maintaining a database design where each sector a self contained unit. So if sector N has 10 passwords in it then it would only see accesses if one of those 10 passwords changed or read. There will be no "directory" type sectors that would tend to see a lot more writes or reads than others so a relatively even wear will be achieved by the law of averages.

dumblob commented 6 years ago

If a CRC check fails then the client would be notified and the an alert generated in the client. ... ...as I think the software complexity would create more risk than it mitigates.

Ok. I would though anyway advocate for at least some tiny correction scheme (see e.g. https://en.wikipedia.org/wiki/Error_detection_and_correction ) and not just detection. For me I definitely want to be alerted, that the flash memory seems to show signs of EOL, but I also don't want to lose my passwords at the very same moment (using a HW password manager usually denotes very important passwords). Hamming code or similar is very easy to implement securely.

nthdimtech commented 6 years ago

Well I wouldnt deactivate the device or anything, it would just make some kind of suggestion. Looking at that wikipedia article the most relevant section I see is about ECC codes. My concern would be that the amount of redundancy needed to have highly reliable recovery would cut into the devices utility as a device for for storing many kinds of data. Ill add a bug to add a hamming code at the block level as I see no harm in it.

Most passwords are more problematic to leak than to lose. Web services all allow you to reset your password so all that you need to know to recover access is your email password. I plan on writing "a getting started with signet" tutorial and in it I will recommend that email and other difficult to recover passwords should be complex, ideally randomized, and memorized.

Nothing short of full redundancy should satisfy a user with very critical data however. With signet there are two ways to get this:

1) Download the encrypted database to your PC or a removable media periodically. The app can do this in a semi-automated fashion.

2) Securely transfer the data from a primary signet to a backup signet. This feature is planned but I haven't started implementation yet.

On October 21, 2017 1:41:09 PM PDT, dumblob notifications@github.com wrote:

If a CRC check fails then the client would be notified and the an alert generated in the client. ... ...as I think the software complexity would create more risk than it mitigates.

Ok. I would though anyway advocate for at least some tiny correction scheme (see e.g. https://en.wikipedia.org/wiki/Error_detection_and_correction ) and not just detection. For me I definitely want to be alerted, that the flash memory seems to show signs of EOL, but I also don't want to lose my passwords at the very same moment (using a HW password manager usually denotes very important passwords). Hamming code or similar is very easy to implement securely.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/nthdimtech/signet-firmware/issues/1#issuecomment-338430960

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

nthdimtech commented 6 years ago

Turns out the flash memory has an internal ECC check. I just need to report the corrections and errors it finds back to the client. I created a new issue for this.

dumblob commented 6 years ago

Internal ECC support sounds good. But the question is what's the "level" of the check (can it just detect one error bit, or can it detect two error bits, or three error bits, ..., or can it even correct one bit error, or a two bit error, or a three bit error, ...). Personally I would prefer for each octet a correction of two bits with a detection of them (with a Hamming code it would be a correction of 2 error bits and a detection of 3 error bits). If this is not supported, then I would really recommend to do it in C manually additionally to using the internal ECC (which usually supports just one error bit detection).

Btw, how does the procedure for any regarding any password (addition, deletion, modification) looks like? Something along the following lines?

  1. receive the password on the uC from a computer
  2. store the received password to the flash
  3. rewrite all the variables which had something to do with the password with random characters
  4. flush caches (if any on the uC) - e.g. by "mounting/unmounting" the flash memory internally
  5. read the saved password from the flash
  6. send the read password to the computer
  7. compare the password from uC with password, which was initially sent to the uC
  8. if the comparison shows no difference, the whole procedure was successful
  9. if the comparison does show any difference, the flash or whatever is corrupted and the password has to be written again (I would do it 3x and then just notify the user, that he needs to buy a new dongle)
nthdimtech commented 6 years ago

It provides 1 bit of correction and 2 bits of detection for every 64 bits of the memory. I'm satisfied with this level of protection myself. Keep in mind that passwords and other similar data doesn't tend to change frequently so it's hard to imagine a sector to seeing the 10k+ write cycles where failures could start to appear in any short time period. All the usage models I can think of predict at 10+ year lifespan even if all records are relatively small and change individually every month.

For your second point. Let's just call them entries not passwords. The device itself isn't actually aware of what it stores it just encrypts and decrypts binary blobs on behalf of the client. Each blob has a bit-mask indicating which bytes refer to "meta-data" and can be transmitted to the client without a button press and which bytes are "secret" and can only be revealed by a read operation on a specific entry. This masking technique allows the client to produce a list of entries on startup without having any of the secret data transmitted to it initially.

There are already CRC checks in the USB standard and the USB hub will automatically retry on a CRC error so there is no need for the kind of end to end error checking you describe. I can't say I understand exactly how individual flash blocks tend to fail. I'll add a bug to do a read-back after write but I suspect that flash sectors do not always fail immediately after a write. As I understand it the built in ECC is triggered on any read so either the aforementioned read-back would catch it or it would be caught on the next attempt to access it.

You mention zeroing out or randomizing the password or what I would call generically "secret" data. As I see it this doesn't make that much sense in the context of Signet. If Signet is decrypting the data then it is simply going to give it to the client so the secret has already been given. If Signet is encrypting it then the data was generated on the host and malware/spyware on the host would have had a chance to intercept it. The real security mechanism is the scoping of data. Any malware can't access data beyond what you authorize with button presses . Thus, you should be limit what data you operate on relative to your level of trust of the system you are using.

All that said I'll add a bug to do an audit on data sent in outgoing buffers to make sure any extra bytes are zeroed so minimize the value of spying on the device.

dumblob commented 6 years ago

Ok, that everything sounds good. Thanks a lot!

You mention zeroing out or randomizing the password or what I would call generically "secret" data. As I see it this doesn't make that much sense in the context of Signet. ...

Well, another thing was, that this "randomizing" shall help to trigger cache invalidation in the uC and therefore a real read from the flash memory must be done to read the recently written entry (password).

nthdimtech commented 6 years ago

Ok, that everything sounds good. Thanks a lot!

No problem.

Well, another thing was, that this "randomizing" shall help to trigger cache invalidation...

Ah I see. I just checked the flash memory section on this to be sure. It automatically invalidates the data cache after a programming operation so that's covered.

nthdimtech commented 4 years ago

Going to drop this one as it's essentially fixed if defined narrowly enough. There is an explicit CRC check in the firmware and built-in flash ECC to let us know when the flash is failing. I think doing wear levelling in software is not a good risk to value proposition. I don't think I have the time to do enough testing to ensure that any wear levelling code doesn't have a bug that could damage users data when there isn't any damage to the flash to start with.