tweedegolf / sequential-storage

A crate for storing data in flash memory with minimal need for erasing pages
Apache License 2.0
99 stars 10 forks source link

Shutdown strengthening #16

Closed diondokter closed 9 months ago

diondokter commented 9 months ago

Using the fuzzer to simulate a power shutdown and adding a repair function that can solve it.

diondokter commented 9 months ago

Not sure if anybody's watching this space... But if you are, what do you think of the repair function? Should it be something that should be done automatically?

Ideally yes, but that'd be very annoying to implement... Leaving it to the user is way easier.

lulf commented 9 months ago

@diondokter I like the repair function, very nice improvement. I think leaving it up to the user is better and feels more in line with the rest of the API to be honest.

Perhaps a rustdoc comment on the Corrupted enum variant pointing to the try_repair would be sufficient?

diondokter commented 9 months ago

@diondokter I like the repair function, very nice improvement. I think leaving it up to the user is better and feels more in line with the rest of the API to be honest.

Perhaps a rustdoc comment on the Corrupted enum variant pointing to the try_repair would be sufficient?

Ok, cool. I'll do that then.

But I got snagged by multiple writes on flash. Map requires only single-write flash, but now its repair function requires flash that can write twice... Queue requires twice-write flash, but now its repair function requires flash that can write three times...

So... I'm gonna have to spend some time to fix that, if that's even possible.

diondokter commented 9 months ago

Cool! I was able to fix it without doing major changes :D

diondokter commented 9 months ago

Turns out it's actually useful to fuzz in CI! Otherwise I'd already have merged it :P

lulf commented 9 months ago

@diondokter I've been testing the repair function a bit now, and I wanted to ask about a behavior I'm seeing in my initialization process, sssuming I've written gibberish to the queue partitions before startup:

The error in particular is "Corrupted: All pages are closed", which suggests it haven't realy repaired/erased anything.

lulf commented 9 months ago

In particular it's caused by the find_youngest_page at the end of the function:

    // All pages are closed... This is not correct.
    #[cfg(feature = "defmt")]
    defmt::error!("Corrupted: All pages are closed");

    Err(Error::Corrupted {
        #[cfg(feature = "_test")]
        backtrace: std::backtrace::Backtrace::capture(),
    })

I'm wondering if there is a condition here that the repair doesn't fully fix. Perhaps it should check the for this condition as well and return an error in that case?

This would make the use of the API a lot smoother, since you can trust try_repair to give you an error if something is wrong rather than trying to use the flash.

diondokter commented 9 months ago

Ah I see.

I designed the repair function so it can repair the state if it got reasonably corrupted from a previous correct state. If you start with gibberish, then there's not really anything that can be repaired. It likely also won't be able to repair effects from random bitrot.

So right now the repair function is only repairing things that result from an unexpected shutoff with the assumption that bytes are written from start to end every transaction.

In your case, if you follow the documentation, you should end up with erasing the flash yourself so you start fresh again.

I guess that checking the flash for errors in the repair function would be nice for users, but there are so many things that could go wrong it'd be difficult to check them all.

lulf commented 9 months ago

Alright, thanks for explaining, that makes sense!