rust-embedded-community / embedded-storage

An Embedded Storage Abstraction Layer
Apache License 2.0
63 stars 21 forks source link

Design discussion regarding usability issues #23

Open ia0 opened 2 years ago

ia0 commented 2 years ago

Hi all,

I'd like to list a few issues for those trying to write libraries on top of the abstractions provided by this crate. I'll only focus on nor_flash::ReadNorFlash and nor_flash::NorFlash since those are the only traits I need and thus have looked at. I'd be happy to propose an alternative to those traits with a demonstration as to how to use them (i.e. write a generic library on top of it) as well as how to implement them (i.e. how a chip can provide the trait). But before I start this effort, I would like to know if there are any strong reasons as to why those choices have been made.

Some term definition to be sure we understand ourselves:

Why read(&mut self) instead of read(&self)?

I couldn't find anything about it in the documentation and code. Because of this design, the user can't share the object for read-only operations which is quite counter-intuitive. One would need interior mutability to work around it. But then why shouldn't the implementation do this instead of the user?

Note that this is related to direct reads as having read(&mut self) prevents reading until the slice of the previous read is dropped.

Do we need READ_SIZE?

There's already some discussion in #19. I would argue the same way as for read(&mut self) and say that the implementation should take care of that, not the user. Some helper functions could be provided by this crate to help implementations translate from a coarse read to a fine read. I can write this helper as part of the demonstration.

Do we need write() to be WRITE_SIZE-aligned?

(Note that we still need to expose WRITE_SIZE.)

Similar to the point above, the implementation could take care of this. And it's a user error to write more than WRITE_COUNT times to a WRITE_SIZE-aligned region (see taxonomy below), because the implementation can't possibly track/enforce this for all types of flash without shadowing the whole flash.

Flash taxonomy

This is not a usability issue per se, but I believe this to be critical to design those traits. I'm thinking about a markdown in this crate that could look like:

Technology:

Read:

NorWrite (Nor only):

Add other specifications here.

Examples (all units are in bytes):

ia0 commented 2 years ago

There is another minor point I forgot.

Why mix u32 and usize?

Offsets are expressed as u32 while sizes (e.g. WRITE_SIZE or capacity()) are expressed as usize. This forces a fair amount of casts. It seems more natural and coherent to always use usize. By definition, we have offset: u32capacity(): usize, so any valid offset should fit usize, rendering the usage of u32 awkward.

The alternative of using u32 everywhere seems a bit arbitrary. Why would u32 be more fit than usize which is related to the architecture and thus the flash?

I guess the exceptions (and thus interesting cases to consider) would be flashes that are not memory-mapped. I don't know any of those, but I also don't know many flashes. I would be interested to get some pointers and see if they can actually address more flash than memory (and should we still consider such huge flashes as embedded?).

Dirbaio commented 2 years ago

Why read(&mut self) instead of read(&self)?

External flashes need exclusive access to the underlying SPI/QSPI peripheral to do the operation.

What's the usecase for "sharing" readonly access to the same flash? Is it to give different "partitions" to different "components"? For that IMO allowing "splitting" a flash into disjoint address ranges would work better, it could allow read/write access to the different partitions, while read(&self) would work only for reading. This "splitter" adapter could be part of the implementor, or it could be a wrapper that internally shares a single flash with RefCell.

Do we need READ_SIZE? .. the implementation should take care of that, not the user

The motivation was nrf's QSPI peripheral which can only DMA read at addresses multiple of 4.

IMO I prefer for NorFlash impls to be "low level, no automatic magic". We could have a wrapper that turns a NorFlash<READ_SIZE=4> into a NorFlash<READ_SIZE=1> doing the required "magic", so that it's opt-in. I think in the nRF case this "magic" would require reading into an intermediate buffer from an aligned address, then copying to the final user's buffer, which is not very nice.

Do we need write() to be WRITE_SIZE-aligned?

nrf's QSPI DMA needs addr to be multiple of 4. Also, same as above, we could add wrappers that turn NorFlash<WRITE_SIZE=4> into a NorFlash<WRITE_SIZE=1>, I'd rather not force the overhead onto everyone.

Flash taxonomy

100% agree we need to specify this more, and perhaps make it more fine-grained... Currently NorFlash is either write-once or write-infinite-times (multiwrite) which doesn't accurately model reality...

The nrf52832 512, 181 thing is super cursed... why did they do that!?!?

Why mix u32 and usize?

Do we want to support >4gb devices? 16gb, 32gb SD cards are ubiquitous and dirt cheap and it's quite likely you want to use them in an embedded 32bit device...

usize addresses would be awkward because you could support 64bit addrs only in 64bit targets. On the other hand, mixing is weird though.

Perhaps addresses and capacity should be u64, and ERASE_SIZE, READ_SIZE, WRITE_SIZE can stay as usize? These will never be as big as the whole capacity and it's common to use them as array sizes which would reduce the need for casts.

ia0 commented 2 years ago

External flashes need exclusive access to the underlying SPI/QSPI peripheral to do the operation.

Maybe external flashes need a different trait, see below.

What's the usecase for "sharing" readonly access to the same flash?

A generic key-value library, e.g. something with the following hypothetical signature:

pub struct Store<F: NorFlash> { flash: F }
impl<F: NorFlash> Store<F> {
    // Notice the &self instead of &mut self.
    fn get(&self, key: Self::Key) -> Self::Result<Option<Self::Value>>;
    // Here it's &mut self as expected.
    fn insert(&mut self, key: Self::Key, value: Self::Value) -> Self::Result<()>;
    // ...
}

One cannot implement this signature if NorFlash::read() takes &mut self without wrapping F into RefCell or Mutex, which the implementor cannot choose. So this must be left to the user (of the key-value library) to choose the appropriate wrapper. The implementor of NorFlash on the other hand has knowledge about the chip (if it's not an external flash, see above) and knows which of RefCell or Mutex makes sense, assuming read can't be implemented with &self in the first place which is probably rarely the case for embedded (or is it called internal?) flashes.

The motivation was nrf's QSPI peripheral which can only DMA read at addresses multiple of 4.

Is the QSPI peripheral to access external or embedded flash? This could be another argument to have different traits (embedded flash could always be seen as external flash though, but not the other way around).

IMO I prefer for NorFlash impls to be "low level, no automatic magic".

This is fair and I understand why this is important. I still need to continue progress on my library to see if the wrappers I add on top of NorFlash are essentially necessary by any library trying to write a generic data-structure on top of NorFlash or not. Depending on the answer, I would suggest the following options (which actually applies to all other points too):

100% agree we need to specify this more

Actually an idea I had in the meantime would be a crate documentation instead of a markdown. Flash devices would be structs and properties would be trait. This way we could use cargo doc to render the taxonomy and use existing tools to edit the taxonomy (rust-analyzer for renaming/refactoring, rustc to check for typos, etc). It could look like:

// Traits
/// Write only flips selected bits in region from 1 to 0. Erase flips all bits in region to 1. Write region divides erase region.
pub trait Nor {
    const WRITE_SIZE: usize;
    const ERASE_SIZE: usize;
}
/// Count restriction on Nor writes. A write region can only be written a given amount of time before erase.
pub trait WriteCount: Nor {
    const MAX_COUNT: usize;
}
/// Count restriction on Nor writes. Only a given amount of write regions can be written in a row region. Write regions divide row regions which divide erase regions.
pub trait WriteRow: Nor {
    const ROW_SIZE: usize;
    const MAX_COUNT: usize;
}

// Generic impls
impl<F: WriteCount> WriteRow for F {
    const ROW_SIZE: usize = <F as Nor>::WRITE_SIZE;
    const MAX_COUNT: usize = <F as WriteCount>::MAX_COUNT;
}

// Flash devices
pub struct Nrf52840;
impl Nor for Nrf52840 {
    const WRITE_SIZE: usize = 4;
    const ERASE_SIZE: usize = 4096;
}
impl WriteCount for Nrf52840 {
    const MAX_COUNT: usize = 2;
}

Do we want to support >4gb devices? 16gb, 32gb SD cards are ubiquitous

This sounds again like external flash devices. Another reason to distinguish between embedded and external flashes.

usize addresses would be awkward because you could support 64bit addrs only in 64bit targets

This probably only applies to external flash. For embedded flash, it's probably expected to only support 64-bits addresses on 64-bits targets. I would be surprised to see a chip with more flash that it can address (but I agree that it's theoretically possible, the closest I know is 1M flash versus the 4G addressable in which RAM and registers need to fit too). But then, I'm also assuming that embedded flash devices are memory-mapped. I don't know counter-examples yet. They should go to the taxonomy.

Perhaps addresses and capacity should be u64, and ERASE_SIZE, READ_SIZE, WRITE_SIZE can stay as usize? These will never be as big as the whole capacity and it's common to use them as array sizes which would reduce the need for casts.

Why not. This sounds like the safest option. Although it only looks required for external flash.

chemicstry commented 2 years ago

To complicate things even more, STM32F4 series have different sector sizes for different regions (reference page 77). So the ERASE_SIZE is dependent on the address. Would it make sense to have const ERASE_SIZE: &'static [Page]?

Moreover, the sector layout is also dependent on runtime bank configuration: single bank or dual bank. However, to reduce complexity in traits, I think this could be solved by typestates in HAL: two different flash types with different trait implementations.

EDIT: Thinking about this more, the &'static [Page] approach would not work for large memories with small page sizes as the array would get too big. Maybe a dynamic iterator approach would work here?

Dirbaio commented 2 years ago

This was discussed before, see https://github.com/rust-embedded-community/embedded-storage/issues/9#issuecomment-788695162

tldr the conclusion was to not support non-uniform sector sizes, it's way too cursed. The HAL can expose an impl covering the whole range with 128kb sector size (merging the smaller sectors), and an impl covering just 0x0800 0000 - 0x0800 FFFF with 16kb sector size, and 0x0800 0000 - 0x0801 FFFF with 64kb sector size

chrysn commented 2 years ago

way too cursed

Agreed. I've seen the flash abstraction in RIOT that does try for numbered (erase) pages with possibly different sizes, and it's a mess to work with, and some users just rely on a uniform size to be provided anyway. Having per-type size characteristics is a good approach, and whoever really uses inhomogenous memory in a single application can use their own useful abstractions.

chrysn commented 2 years ago

There are more properties that may be worth describing:

(Sorry, I don't find the example after having read too many things in the last days, I hope to add them when I find them or if they become more relevant).

chrysn commented 2 years ago

It may also be useful to separate the names from the technologies. We're talking of NOR devices in this thread, and the current distinction in embedded-storage is between the full abstraction (that may even perform RMW operations silently) and the NOR flashes. However, there is some ground inbetween: SPI NANDs that behave like the current RmwMultiwriteNorFlashStorage, or like a Read::Direct, NorWrite::Row(1, 1, Infinity).

It may make sense to use a dedicated type for the Infinity variant, but I suggest that either we use more generic terminology than the technologies (given vendors do things like embedding one but emulating the other), or at least declare them as "trait ReadX ... with the semantics typically associated with X technology, i.e. property and property" to avoid the confusing situation where something is actually NAND memory but used though a NOR trait.

(This is, I think, orthogonal to the question of whether reads through a shared reference are useful in a device on a shared bus).

ia0 commented 2 years ago

I agree that naming with properties is more useful than naming with technologies (that was my initial intention).

As I'm making progress on my library (still not presentable but will definitely ping this thread when done), I start to believe that we should maybe not try to over-specify the NorFlash trait. The main reason being that flash devices are too different to fit under the same specification without significantly limiting what the actual device can do.

For more details, here is the approach I'm currently pursuing:

A:    Binary or higher-level library (user)
   ------------------------------------------- Public API of B (me)
B:        Generic storage library (me)
   ------------------------------------------- Internal flash abstraction (me)
C: Generic or custom flash implementation (me)
   ------------------------------------------- NorFlash (embedded-storage)
D:   Flash device implementations (xxx-hal)

Instead of using NorFlash directly, I add another layer of abstraction with the properties that are relevant for my library. I also provide implementations for this abstraction:

The library needs to provide features (or another constant mechanism) to select the properties of the actual flash device. This adds work on the user but can be made minimal by just having to select from a list.

The advantage of using NorFlash may seem small but it's actually there. By providing a unique API to access a flash device it factors a lot of code out of the library and leaves only the "configuration" part to the library.

chrysn commented 2 years ago

I don't think we should leave any information for out-of-band -- this is where Rust's type system shines, and it can do all we've described so far.

When overhauling the abstractions I'm working on I've hoped to get away with less, but to my understanding the bare minimum is:

  1. Write granularity -- which blocks need to be set in one go.
  2. Whether one may overwrite
  3. How often one may overwrite (and how overwrites are counted)

I had hoped to get away without 3, but even simple use cases like the riotboot bootloader need, like, one overwrite for (in)validation of partitions. For RIOT I'm thinking something simplified like "is one overwrite per page allowed?" (because there is no type stating); in Rust the full NorWrite::Row(a, b, c) can be usable easily.

In particular, what we can provide even in the library (as https://github.com/rust-embedded-community/embedded-storage/issues/23#issuecomment-1023475266 suggested) all the tools to get from, say, a Row(4, 4, 4) to a Row(1, 1, 1) -- I wouldn't want to implement the required buffering for all drivers individually.

ia0 commented 2 years ago

I agree with the sentiment. The way I see it the trait can range from full parametric (FP) to common denominator (CD):

I guess a useful trait (there's no perfect solution, just useful ones) would probably lie in between those 2 extremes. To find that point, I think both taxonomies are needed (what can flash devices provide and what do library developers need). And both taxonomies can be adapted: for flash devices we can decide to support less, and for libraries we can improve the algorithms to support more devices.

To start the library taxonomy, here are the properties I rely on (for now, it's not finished, and I will adapt as the flash taxonomy grows to be sure to cover as much reasonable devices as possible):

chrysn commented 2 years ago

The nice thing is that we can build upward from CD -- but if CD is to be really common, we already have to know what a reasonable expectation is, because CD API will represent the least usable, strictest but at the same time most exotic device.

For example, you mention non-power-of-2 sizes of erase and write units -- it wouldn't even have occurred to me that such may exist. If such exist, CD would describe a device that can be written to in some device-defined units that may not even be power-of-2. (This may be a bad example because I know of such flash, it would be highly weird, and you're not saying they do, just that it's a limitation you place; I hope we can constrain CD to power-of-2 already).

We can then add traits toward FP as we need them. (Not necessarily as flash chips provide them: they may provdie all kinds of weird properties, but what should guide us toward FP is what applications can meaningfully use -- or we wind up with all kinds of practically-vendor-specific parameters).

In particular, I need the maximum number of times a region can be erased [...]

That's a property that is good to know, but does this make much practical sense? AIU these cycle counts are minimum counts (and practically easily a lot larger than advertised), and at least in developent situations you can't rely on the minimum either (for it may have been reprogrammed before).

I think a usable tiering might be (where only those with a * matter initially):

(I'm assuming here that our granularities can indeed always be power-of-2 -- if not, a non-starred item would envelope the power-of-2 ones)

In addition to those I consider essential, those marked with + would correspond to your check list. To satisfy your "and at most X/Y", these could be done with cost generic constraints. (As we can't say X::EraseSize: < 4096 yet, you'd probably demand X::EraseSize: 4096, and use wrapper types that do upconversions -- or just a static / const assert, which would also help with the 8 write regions).

While the more exotic traits are brewing, they may not even need to reside in the original storage crate (although eventually I think they should, but anyone can define their own extensions and impl them for some storages that provide them).

ia0 commented 2 years ago

[Max erase is] a property that is good to know, but does this make much practical sense?

The way I use it is to provide the library user with lifetime tracking, such that they can notify their own user when the device is close to becoming unpredictable to plan for a replacement (but the device could continue to work with higher risks, it's up to the library user to decide). This is indeed useless for development and only aimed at production devices.

[property hierarchy]

This essentially matches my vision too yes. My concern with such a trait is the complexity/expressivity trade-off. If the trait is too complicated, users might not use it. But apart from that, I would be satisfied with such a trait (assuming it indeed covers many flash devices).

Actually I forgot some other properties I rely on:

those marked with + would correspond to your check list

That's correct.

these could be done with cost generic constraints

Currently I do my checks at runtime because Rust is not yet very convenient to use for static assertions, in particular when they involve generic constants and in particular when they are associated constants. But I'm fine with this limitation for now.

While the more exotic traits are brewing, they may not even need to reside in the original storage crate

Yes, that's where I'm currently going. I define the trait I would like to have in my library and convert from NorFlash with out-of-band knowledge. Ideally, if all libraries using NorFlash have similar abstractions internally, it means NorFlash could be upgraded to match them. The advantage of doing it so, is that there are proofs that it's usable.

chrysn commented 2 years ago

[property hierarchy]

My concern with such a trait is the complexity/expressivity trade-off.

Each of these lines would probably be a single trait (depending on its parents). Users would only see methods they demand in their type requirements, and for selection we'd need a good overview doc page like the one in std::collections. (Possibly also traits may be split in modules -- essentials like read, write and erase could sit in a module distinct from those around atomicity, which would only be pointed to).

  • It is possible to resume an interrupted write operation by writing the same content to the same region (i.e. repeating the write operation). This doesn't consume an extra write cycle.

That'd be neat, is that a property storages actually have? (I suppose, then, that the to-be-written content doesn't need to be exactly the original one, but only needs to have at least the previously-written zeros).


So, how do we go on from here? Sketching a demonstration PR with the more granular traits that subdivide what NorFlash and MultiwriteNorFlash now express?

What are the stability expectations of embedded-storage, esp. w/rt generalized names? (Trait aliases are not stable yet, so we can't use these to carry things over). I don't see a large enough number of dependents on crates.io that would warrant doing something like embedded-hal-compatibility does.

ia0 commented 2 years ago

Each of these lines would probably be a single trait

The problem of doing this is explained in this comment: https://github.com/rust-embedded-community/embedded-storage/pull/22#issuecomment-1008300181. Ideally, there would be a single trait to enforce users to write generic libraries. I don't think it's very realistic though. Because even with a single trait like NorFlash, users could require WRITE_SIZE to be 4 (or whatever is the value for the boards they have) for simplicity. We should probably find another way to force users to care about portability.

That'd be neat, is that a property storages actually have?

I don't know. I didn't see any flash device making any useful claims about power interruptions. In particular it's all about probabilities, it may also depend on the usage temperature, etc. But for SLC chips I think it's quite reasonable to assume that.

the to-be-written content doesn't need to be exactly the original one

That on the other hand I'm pretty sure is wrong. I think the stm32l432 specifies this. This is due to the ECC they use. If you write a different content (that is valid with respect to flipping bits from 1 to 0), then the ECC doesn't necessarily respect that property. A simple example is if you try to write 0000 with ECC 0 which end up writing 0111 with ECC 0 before interruption and then decides to write 0010 with ECC 1, you might only be able to write 0010 with ECC 0 which looks like the write failed due to the ECC.

chrysn commented 2 years ago

I'm not sure the argument of #22 applies for having split traits. It certainly does apply for memory-mapped readability (so that should not be an extra trait, but could be a provided method of the read trait as you suggest there IIUC).

For multiwrite (and also the atomicity guarantees), these would be marker traits with no methods of their own, just maybe with some associated constants. Unlike memory mapped reading, I don't see how those would be selected based on laziness and/or as a premature optimization. And really, if it were to happen, selection for these properties could just as well or even easier happen in a const-generic-typestated setup where one would impl their code for M where M: WritableMemory<Multiwrite> rather than the generic for M, MWV where M: WritableMemory<MMV>, MWV: MultiwriteVariant -- as opposed to those needing it implementing for M where M: WritableMemory whereas those requiring multiwrites would do for M where M: WritableMemory + Multiwrite (or Multiwrite<2, 512>).

That on the other hand I'm pretty sure is wrong.

I understand now why that's wrong and how it would fail -- but then, how is the property still useful? The program would need to reconstruct precisely what it did when it attempted that write. Sure, that works for cases when the crash happened during some kind of defragmentation or journal replay (and makes sense there), but for most of the write cases I that data would be lost (after all, most data is written to flash because the program has it in RAM now and wants to be able to get it back later). So yes, I do see use cases, but they are so niche they'd probably need explaining in the docs.

chrysn commented 2 years ago

Note to self for further investigation: There are applications that make guarantees on sequential writing nffs's property 2, which indicates there could be embedded storage that requires this. The only piece of hardware I'm currently aware of that behaves like this is SMR (new large hard disks), I doubt that is what was on their minds.

ia0 commented 2 years ago

sequential writing

Very interesting, thanks for the concept. I wasn't aware of this. It's good to know, I can easily adapt my library design for this. I already write in a sequential manner except for one block (my terminology for a write region), but I'll move it at the end of the page then (my terminology for an erase region).

chrysn commented 2 years ago

the to-be-written content doesn't need to be exactly the original one

I've now come across a chip that has this property (STM32WB), but it has a neat extra: It does allow one value (0x0000000000000000 -- yes, it has 64bit flash words) to be written no matter what was previously written. That makes it way more useful. Is that true for the flash you are using as well? In that case, the property of the flash could be "overwrites need to be exactly const OverwriteValue" (which I as a user would constrain to [0; _]), and maybe there is no need to model "can overwrite but only to exactly same values" at all (and rather consider those chips can-not-overwrite for practical purposes).

(Reading that memory comes with its own challenges due to an NMI firing when an incorrectable read error is encountered, like I'd expect it to happen when power is cut mid-write -- but that's a different topic.)

MathiasKoch commented 2 years ago

It does allow one value (0x0000000000000000 -- yes, it has 64bit flash words) to be written no matter what was previously written

I think this goes for all STM32's? I am actively using this property in my bootloader at least (stm32l475).

ia0 commented 2 years ago

I think we should distinguish the following 2 properties and not merge them:

Those properties are different because the first doesn't tell anything about power-loss while the second addresses exactly that point.

Note that constructors usually don't try to specify what happens with power-loss or often just say "anything can happen". This is not really useful to build products on top of such hardware. So those properties may not come from constructors but just be assumptions/hypotheses instead. At least, I'm documenting such assumptions in my code.

chrysn commented 2 years ago

If after having "no rewrites unless it is with 0" (possibly "or identical") there is still a use case around for "no rewrites but identical doesn't count", sure we can have both. (I had hoped the former would suffice, but maybe not).

chrysn commented 2 years ago

I think this goes for all STM32's? I am actively using this property in my bootloader at least (stm32l475).

STM32L0 have smaller granularity and no ECCs, so it varies at least by the first digit of the chip number.

ia0 commented 2 years ago

If after having "no rewrites unless it is with 0" (possibly "or identical") there is still a use case around for "no rewrites but identical doesn't count", sure we can have both. (I had hoped the former would suffice, but maybe not).

Yes there is still a need, see the examples I gave in the first post (all nRF52 essentially permit multiple writes and flip 1s to 0s, the next write doesn't even need to be the final value, you can write a 1 on a 0 and the value will stay 0).

adri326 commented 4 months ago

I would like to drop here my remarks around having WRITE_SIZE, ERASE_SIZE and READ_SIZE as const fields on the trait: