tweedegolf / sequential-storage

A crate for storing data in flash memory with minimal need for erasing pages
Apache License 2.0
87 stars 8 forks source link

Add function to return the space available on flash #56

Closed avsaase closed 1 week ago

avsaase commented 2 months ago

Would it possible to add a function that returns the available space for storing new items? I know that because of the overhead per item and "losses" when an item doesn't fit in the current page it's impossible to say how much usable space there actually is. It would nevertheless be useful to have some indication of how much space is left. An upper bound assuming the items are the max size that can be stored would suffice for use case.

diondokter commented 2 months ago

Hi, this is really hard to do because the answer can be wildly different if you're storing max size items vs items with only one byte...

For the queue there's this function: https://docs.rs/sequential-storage/latest/sequential_storage/queue/fn.find_max_fit.html Though it's only for the next item.

Or are you looking for map? For map it's even more difficult because the answer might be 'infinite space left' for a given key.

avsaase commented 2 months ago

I'm aware that's it's difficult to say that the available space is without knowing the item sizes but you could sidestep that problem by reporting the available space on the pages (page size - page state - used page space) and documentation that not all of if it is practically usable. This would already be useful information to have compared to what the library currently provides.

Another idea is for the function to take a item size hint argument to give a more accurate estimate of the usable space.

If you think this is out of scope for this library it would be nice to document how the user can calculate this themselves.

avsaase commented 2 months ago

For the queue there's this function: https://docs.rs/sequential-storage/latest/sequential_storage/queue/fn.find_max_fit.html Though it's only for the next item.

For me this always returns 4088 for a 4096 ERASE_SIZE, regardless of if values are written to flash or it was just erased.

Or are you looking for map? For map it's even more difficult because the answer might be 'infinite space left' for a given key.

Why would it be infinite? At the very least the key itself takes space in flash, right?

diondokter commented 2 months ago

A size hint could be given like you proposed. But how is that useful? What do you want it for? I'm not against it, but I don't know what it's useful for to have an inexact number. So if you could clarify that'd be great!

The find max fit only starts giving a lower number once the last free page has been written to and the storage is almost full. If you're regularly popping data so it never fills up, I would expect your result to always be the max possible.

With the infinite size I mean that that's one interpretation of 'how much can be stored'. Because semantically, when you write a certain key and it fits in a page, you can keep writing that key forever because old versions of that key will be deleted automatically. So for map the number is kinda weird. I could build a function that gives back the current free space, but that's not really useful because that number itself will be inaccurate and it also depends on which key you're gonna write. The map only truly gets full when it is filled up with item which all have a different key. (So no old key can be cleaned up)

So if I understand better what your goal is, I might be able to think of something that could help you. You've come with a proposed solution while I don't understand the problem it solves.

Sorry for being kinda direct, but I don't know how to put it in better words :) I'm happy to see you opening this issue!

avsaase commented 2 months ago

My use case is as follows: I'm making a paragliding variometer that doubles as a flight logger. My plan is to save flight data like position, altitude, speed, etc. to a flash chip at one seconds intervals (using postcard for serialization). Afterwards, the logs can be downloaded to a computer for analysis and sharing online.

The problem I'm running into now is that there is no way to know or show to the user how much space is left on the device until the flash fills up and pushing returns an error (I don't want to use lose logs when the flash fills up so I'm pushing with allow_overwrite_old_data: false). Having a rough indication of the available space before a flight would already be a big improvement over finding out afterwards that your flight recording stopped halfway through.

I see two ways of calculating the available space:

  1. Count the number of open pages and multiply with the page size minus the page overhead and add the space on the current page. Obviously this doesn't take into account the item overhead but if this is clearly documented then I don't think that's a problem. In my use case I would call this function once when the queue is empty and use this value to calculate a percentage of space available. The exact number of bytes is not important for that.
  2. Do the same as 1 but take a item size hint calculate the actually usable space for that given item size. With heterogeneous item size this won't give the correct number of bytes either but it would be closer to the true value.

For both cases calculating/estimating the available space requires scanning through all page headers so this could be slow. I think it would make sense to store and update the value in the cache but the user could also manage this themselves.

So far I've mostly considered queues because that fits my use-case but I think the same principles apply to maps. I hadn't thought about the case of updating a key which doesn't take additional space but I IMO nevertheless it's better to have an indication of available space for new/larger items than not.

Sorry for being kinda direct, but I don't know how to put it in better words :)

Not a problem at all!

FYI, the item memory layout in the item.rs module is not visible in the documentation, is that intentional?

diondokter commented 1 month ago

Thanks for clarifying!

I think it makes sense to add this to the queue. For map, I think the returned number is meaningless. Adding a fn total_available_space() and a fn overhead_per_item() would allow people to quickly see the current status, but also to calculate it more precisely.

I would merge a PR like this. So if you want to hack on it, then feel free! Not sure when I have time to implement something like this, but I do have time to review.

diondokter commented 1 month ago

Hey @avsaase, I found some time! Would #57 work for you?

avsaase commented 1 month ago

This will help my out a lot! I was gonna make a PR myself but got distracted with other parts of my project. Thanks for taking the time to implement this.