tweedegolf / sequential-storage

A crate for storing data in flash memory with minimal need for erasing pages
Apache License 2.0
87 stars 8 forks source link

feat: support peek_many and pop_many #12

Closed lulf closed 7 months ago

lulf commented 7 months ago

Supporting peek_many and pop_many is needed for situations where you want to batch multiple events in a request to some other location, but don't want to clear them from the queue until you're sure the request got complete successfully.

I've started working on an implementation of this functionality, but I'd like to get your opinion first on what the API should look like. Obviously I'd need to add more tests and docs here, just want to get your opinion on this first.

I've basically got three versions, where the first version is in this PR:

pub fn peek_many<'d, S: NorFlash, const N: usize>(
    flash: &mut S,
    flash_range: Range<u32>,
    data_buffer: &'d mut [u8],
    output_buffer: &mut [MaybeUninit<&'d mut [u8]>; N],
) -> Result<usize, Error<S::Error>> {

vs.

pub fn peek_many<'d, S: NorFlash, const N: usize>(
    flash: &mut S,
    flash_range: Range<u32>,
    data_buffer: &'d mut [u8],
    output_buffer: &mut Vec<&'d mut [u8], N>) -> Result<(), Error<S::Error>>

vs.

pub fn peek_many<'d, S: NorFlash, const N: usize>(
    flash: &mut S,
    flash_range: Range<u32>,
    data_buffer: &'d mut [u8],
) -> Result<Vec<&'d mut [u8], N>, Error<S::Error>> {

Maybe there are other alternatives that you think could work.

The advantage of the first one is that we can avoid the heapless dependency, at no big cost API-wise, since you'd generally need to pre-alloc the size of the heapless vec anyway.

The API for pop_many would be extended in a similar way.

diondokter commented 7 months ago

Hi, thanks for working on this! Seems like a useful api to have :)

Hopefully the code is clear enough to be able to work through it.


First thing going through my mind is, why not return an iterator? At that point the user can decide to collect it into a vec or just iterate over it. This way we can avoid the heapless dependency and still give the user a nice API. The pop_many iterator would then be very simple, pretty much just calling pop multiple times.

The downside of this is that if the user wants to have access to multiple entries at once, he'll have to copy the slices because the iterator would return borrowed slices.

I don't know if that's a good tradeoff or not.


From your suggestions I like number 2 best.

Number 1 is weird. A maybeuninit mutable reference?

Number 3 would have to slice up the data_buffer to create its outputs. Possible, but might be surprising because the API contract with the other functions is that the data_buffer must be at least as big as the largest data in flash. That contract would then suddenly change. That could be acceptable though if properly documented.

Can we combine 2 and iterators to get the best of both worlds? Maybe, depends on whether the borrowchecker will let us.


So my proposal would be something more like:

/// Peek many items. The maximum number of items that will be iterated over is the number of data_buffers given.
/// The iterator will also end when there's no more data in flash.
pub fn peek_many<'d, S: NorFlash, const N: usize>(
    flash: &mut S,
    flash_range: Range<u32>,
    data_buffers: &mut [&'d mut [u8]]
) -> impl Iterator<Item = Result<&'d mut [u8], Error<S::Error>>>

What do you think?

lulf commented 7 months ago

Thanks for the quick feedback! The iterator idea is interesting, it gives the caller more freedom on where to store the emitted sub-slices and remove the heapless from the API which should ease maintenance. I'll try it out, thanks for the idea.

The &mut for alternative 1 was primarily to match the &mut returned slice in the peek() api (I was a bit surprised it wasn't just an immutable slice tbh).

I noticed you specified data_buffers: &mut [&'d mut [u8]] in your example, but I was hoping to avoid requiring the caller to split this up (passing a single slice feels like a nicer API especially if you have events of different sizes).

diondokter commented 7 months ago

Cool, let's try iterators then.

If you want to implement it with a single buffer, then that's fine too. But make sure to document what that means for the user so they don't get any surprises.

Dirbaio commented 7 months ago

maybe return an "iterator-like" thing that takes the buffer for each read? This way the user can choose whether to reuse the same buffer, or use different buffers, dynamically even.

pub fn peek_many<'d, S: NorFlash>(
    flash: &'d mut S,
    flash_range: Range<u32>,
) -> Peeker<'d, S>;

struct Peeker<'d, S> {...}

impl<'d, S> Peeker<'d, S> {
    pub fn next<'a>(&mut self, buf: &'a mut [u8]) -> Result<Option<&'a mut [u8]>, Error<S::Error>>;
}
diondokter commented 7 months ago

Oh that's cool too! Even more usable in various situations.

However, you would lose the iteration methods like map and such, that'd be the tradeoff. Although, a normal iterator with my proposed API could be built on top of this. Which would give that functionality back.

@lulf Would you be up for that?

Dirbaio commented 7 months ago

I don't think the iterator api reusing the same buffer works, you'd need a GAT based iterator (LendingIterator) for that.

diondokter commented 7 months ago

I don't think the iterator api reusing the same buffer works, you'd need a GAT based iterator (LendingIterator) for that.

Ah, dang... You're probably right. Then my first proposal wouldn't work either... Well, then yours is the best proposal so far!

lulf commented 7 months ago

Agreed on @Dirbaio's suggestion, I'll give that a try!

diondokter commented 7 months ago

Was curious, so I already read your commit.

Should've documented that unsafe copy of the data buffer 😅 I don't remember what the exact problem was.

I do remember that the borrow checker wasn't happy about the buffer because of the loop or something. With a colleague I figured that the code itself was fine, just that the borrow checker wasn't convinced.

Creating the copy made it work. Should be fine and MIRI didn't complain either.

lulf commented 7 months ago

@diondokter Got an alternative to the unsafe, by passing back the data_buffer when needed. Not sure if it's better, but perhaps easier to reason wrt. lifetimes.

Also I think this is ready for a proper review now.

Dirbaio commented 7 months ago

you can also get rid of the unsafe by returning usize length instead of the buffer.

(If you still want the public API to return a slice you can make an inner function that returns usize, and then a wrapper that reslices it. IMO APIs that return usize are nicer for the end user too, but that's another story)

lulf commented 7 months ago

Good suggestion @Dirbaio. I'm fine either way... What I like about returning the slice is that it's harder for the caller to make a mistake indexing into the buffer.

diondokter commented 7 months ago

I always go back and forth on this...

Let's keep it return a slice so keep backward compat. I'll review it in a minute 😄

diondokter commented 7 months ago

Very nice! Thanks for your work. I'll update the changelog and make a new release.