tweedegolf / sequential-storage

A crate for storing data in flash memory with minimal need for erasing pages
Apache License 2.0
87 stars 8 forks source link

Flash usage of sequential storage #34

Closed HaoboGu closed 4 months ago

HaoboGu commented 4 months ago

Hi there, I'm using sequential-storage for more usages in my project, I found that sequential-storage uses a lot of flash space(about 15KB):

image

I implemented StorageItem for only 4 structs. Is there any way to reduce flash usage? thanks!

diondokter commented 4 months ago

Hmmm I've got to admit that I haven't watched binary size very carefully lately. Most of these seem pretty reasonable, especially when (parts of) the flash driver is likely inlined too.

What I think is strange is those multiple store_item_inner rows with a closure. The strangest part that that function doesn't have any closures in it...

Could you run this again with RUSTFLAGS="-C inline-threshold=0" and lto = false? This will stop the inlining and give a fairer view of the situation.

HaoboGu commented 4 months ago

yeah.

store_item_inners are still there.

image

my cargo profile config:

[profile.release]
codegen-units = 1       # better optimizations
debug = true            # no overhead for bare-metal
opt-level = "z"         # optimize for binary size
overflow-checks = false
lto = false

and rust flags in .cargo/config:

rustflags = [
  "-C", "inline-threshold=0"
]
diondokter commented 4 months ago

Thanks for checking! Strange how the numbers don't change much...

I'll try to look into it when I have time

HaoboGu commented 4 months ago

I tried to unify all stored structs into one enum, which could reduce flash usages to about 11KB, still quite large.

I attached all the cargo bloat --release info, hope that would be helpful

$ cargo bloat --release -n 100
    Finished release [optimized + debuginfo] target(s) in 0.05s
    Analyzing target/thumbv7em-none-eabihf/release/rmk-stm32h7

File  .text    Size                 Crate Name
0.5%  17.7% 11.1KiB      embassy_futures? <embassy_futures::select::Select4<A,B,C,D> as core::future::future::Future>::poll
0.4%  13.5%  8.4KiB      embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.2%   7.5%  4.7KiB         embassy_stm32 embassy_stm32::rcc::_version::init_pll
0.2%   6.2%  3.9KiB    sequential_storage sequential_storage::map::store_item_inner::{{closure}}
0.1%   4.5%  2.8KiB                   rmk <rmk::keycode::KeyCode as num_enum::FromPrimitive>::from_primitive
0.1%   3.3%  2.1KiB    sequential_storage sequential_storage::map::try_repair::{{closure}}
0.1%   2.7%  1.7KiB                   std core::fmt::Formatter::pad
0.1%   2.2%  1.4KiB        embassy_stm32? <embassy_stm32::usb_otg::usb::Bus<T> as embassy_usb_driver::Bus>::poll::{{closure}}
0.0%   1.6%   1012B    sequential_storage sequential_storage::map::fetch_item_with_location::{{closure}}
0.0%   1.3%    830B                   std compiler_builtins::mem::memmove
0.0%   1.2%    774B         embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Plldiv> for embassy_stm32::time::Hertz>::div
0.0%   1.1%    692B             [Unknown] OTG_HS
0.0%   1.0%    620B           embassy_usb embassy_usb::class::hid::build
0.0%   0.9%    592B      embassy_executor embassy_executor::arch::thread::Executor::run
0.0%   0.9%    548B        embassy_stm32? <embassy_stm32::usb_otg::usb::Endpoint<T,embassy_stm32::usb_otg::usb::In> as embassy_usb_driver::EndpointIn>::write::{{closure}}
0.0%   0.8%    540B                   std <core::fmt::builders::PadAdapter as core::fmt::Write>::write_str
0.0%   0.8%    520B                   rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::process_action_keycode
0.0%   0.8%    508B    sequential_storage sequential_storage::map::fetch_item::{{closure}}
0.0%   0.7%    460B    sequential_storage sequential_storage::item::ItemIter::next::{{closure}}
0.0%   0.6%    414B                   std core::fmt::Formatter::pad_integral
0.0%   0.6%    384B           embassy_usb embassy_usb::Inner<D>::handle_bus_event::{{closure}}
0.0%   0.6%    380B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::NorFlash>::write::{{closure}}
0.0%   0.6%    370B                   std core::fmt::write
0.0%   0.6%    356B         embassy_stm32 embassy_stm32::dma::dma::on_irq_inner
0.0%   0.5%    340B    sequential_storage sequential_storage::item::ItemHeader::read_new::{{closure}}
0.0%   0.5%    336B    sequential_storage sequential_storage::map::store_item::{{closure}}
0.0%   0.5%    328B                   std compiler_builtins::mem::memcpy
0.0%   0.5%    328B                  rmk? <rmk::storage::StorageData<_,_,_> as sequential_storage::map::StorageItem>::deserialize_from
0.0%   0.5%    328B                   rmk <rmk::hid::UsbHidWriter<D,_> as rmk::hid::HidWriterWrapper>::write::{{closure}}
0.0%   0.5%    324B    sequential_storage sequential_storage::item::Item::write_raw::{{closure}}
0.0%   0.5%    300B        embassy_stm32? <embassy_stm32::usb_otg::usb::Endpoint<T,embassy_stm32::usb_otg::usb::Out> as embassy_usb_driver::EndpointOut>::read::{{closure}}
0.0%   0.5%    296B           embassy_usb <embassy_usb::class::hid::Control as embassy_usb::Handler>::control_in
0.0%   0.5%    296B    sequential_storage sequential_storage::item::ItemHeader::read_item::{{closure}}
0.0%   0.4%    270B                   rmk rmk::via::keycode_convert::from_via_keycode
0.0%   0.4%    256B    sequential_storage sequential_storage::partial_close_page::{{closure}}
0.0%   0.4%    252B    sequential_storage sequential_storage::get_page_state::{{closure}}
0.0%   0.4%    252B    sequential_storage sequential_storage::get_page_state::{{closure}}
0.0%   0.4%    252B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::NorFlash>::erase::{{closure}}
0.0%   0.4%    244B                   std core::fmt::num::imp::<impl core::fmt::Display for u32>::fmt
0.0%   0.4%    232B           panic_probe <&T as core::fmt::Display>::fmt
0.0%   0.4%    232B        embassy_stm32? <embassy_stm32::usb_otg::usb::Bus<T> as embassy_usb_driver::Bus>::endpoint_set_enabled
0.0%   0.4%    232B                   std <&T as core::fmt::Debug>::fmt
0.0%   0.4%    232B         embassy_stm32 embassy_stm32::usb_otg::usb::Driver<T>::alloc_endpoint
0.0%   0.3%    224B                   rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::serialize_and_send_composite_report::{{closure}}
0.0%   0.3%    220B                   std <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::try_fold::flatten::{{closure}}
0.0%   0.3%    218B                   std core::fmt::Formatter::debug_tuple_field1_finish
0.0%   0.3%    212B    sequential_storage sequential_storage::find_first_page::{{closure}}
0.0%   0.3%    212B    sequential_storage sequential_storage::find_first_page::{{closure}}
0.0%   0.3%    208B         embassy_stm32 _embassy_time_set_alarm
0.0%   0.3%    200B                   rmk rmk::via::keycode_convert::to_via_keycode
0.0%   0.3%    200B                   std core::fmt::num::<impl core::fmt::UpperHex for i32>::fmt
0.0%   0.3%    192B           embassy_usb embassy_usb::descriptor::BosWriter::capability
0.0%   0.3%    182B                   std core::panicking::assert_failed_inner
0.0%   0.3%    180B         embassy_stm32 embassy_stm32::dma::bdma::on_irq_inner
0.0%   0.3%    172B         embassy_stm32 embassy_stm32::usb_otg::usb::Driver<T>::alloc_endpoint
0.0%   0.3%    166B                   std compiler_builtins::arm::__aeabi_memcpy4
0.0%   0.3%    164B         embassy_stm32 embassy_stm32::flash::common::get_sector
0.0%   0.3%    162B           embassy_usb <embassy_usb::class::hid::Control as embassy_usb::Handler>::control_out
0.0%   0.2%    160B             defmt_rtt _defmt_write
0.0%   0.2%    156B                   std compiler_builtins::mem::memset
0.0%   0.2%    156B    sequential_storage sequential_storage::item::ItemHeader::write::{{closure}}
0.0%   0.2%    154B                  rmk? <rmk::action::KeyAction as core::cmp::PartialEq>::eq
0.0%   0.2%    148B                   rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::process_key_action_normal
0.0%   0.2%    140B           embassy_usb embassy_usb::descriptor::DescriptorWriter::write
0.0%   0.2%    138B                 defmt core::fmt::Write::write_char
0.0%   0.2%    136B                   std compiler_builtins::arm::__aeabi_memset4
0.0%   0.2%    128B           panic_probe rust_begin_unwind
0.0%   0.2%    124B        embassy_stm32? <embassy_stm32::usb_otg::usb::ControlPipe<T> as embassy_usb_driver::ControlPipe>::reject::{{closure}}
0.0%   0.2%    116B         embassy_stm32 embassy_stm32::time_driver::RtcDriver::next_period
0.0%   0.2%    114B          embassy_sync embassy_sync::waitqueue::atomic_waker::AtomicWaker::register
0.0%   0.2%    112B         embassy_stm32 TIM2
0.0%   0.2%    112B      embassy_executor _embassy_time_schedule_wake
0.0%   0.2%    112B    sequential_storage sequential_storage::open_page::{{closure}}
0.0%   0.2%    108B           embassy_usb embassy_usb::descriptor::DescriptorWriter::endpoint
0.0%   0.2%    104B           embassy_usb <embassy_usb::descriptor_reader::DescriptorIter as core::iter::traits::iterator::Iterator>::next
0.0%   0.2%    104B         embassy_stm32 embassy_stm32::flash::family::blocking_wait_ready
0.0%   0.2%    104B                  rmk? <rmk::keycode::ModifierCombination as core::cmp::PartialEq>::eq
0.0%   0.2%    102B    sequential_storage sequential_storage::map::StorageItem::deserialize_key_only
0.0%   0.2%    100B    sequential_storage sequential_storage::item::adapted_crc32
0.0%   0.1%     94B      embassy_executor embassy_executor::raw::waker::wake
0.0%   0.1%     92B             defmt_rtt defmt_rtt::channel::Channel::write_impl
0.0%   0.1%     92B                   std <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::try_fold::flatten::{{closure}}
0.0%   0.1%     92B        embassy_stm32? <embassy_stm32::usb_otg::usb::ControlPipe<T> as embassy_usb_driver::ControlPipe>::accept::{{closure}}
0.0%   0.1%     88B         embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Hpre> for embassy_stm32::time::Hertz>::div
0.0%   0.1%     88B             defmt_rtt _defmt_acquire
0.0%   0.1%     88B                   std core::ops::function::FnMut::call_mut
0.0%   0.1%     88B                 defmt defmt::export::fmt
0.0%   0.1%     84B           static_cell static_cell::StaticCell<T>::init
0.0%   0.1%     80B         embassy_stm32 embassy_stm32::rcc::_version::apb_div_tim
0.0%   0.1%     80B         embassy_stm32 embassy_stm32::usb_otg::usb::ep0_mpsiz
0.0%   0.1%     80B    sequential_storage sequential_storage::item::MaybeItem::unwrap
0.0%   0.1%     80B           static_cell static_cell::StaticCell<T>::init
0.0%   0.1%     80B           static_cell static_cell::StaticCell<T>::init
0.0%   0.1%     76B           embassy_usb embassy_usb::msos::MsOsDescriptorWriter::end_subset
0.0%   0.1%     76B             defmt_rtt _defmt_release
0.0%   0.1%     76B                   std core::result::unwrap_failed
0.0%   0.1%     76B                   rmk rmk::matrix::Matrix<In,Out,_,_>::get_key_state
0.0%   0.1%     76B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::ReadNorFlash>::read::{{closure}}
0.0%   0.1%     74B                   std <core::fmt::builders::PadAdapter as core::fmt::Write>::write_char
0.0%   0.1%     72B           embassy_usb embassy_usb::Inner<D>::handle_control_in_delegated
0.2%   8.5%  5.3KiB                       And 259 smaller methods. Use -n N to show more.
2.7% 100.0% 62.5KiB                       .text section size, the file size is 2.2MiB

Crate statistics:

$ cargo bloat --release --crates
    Finished release [optimized + debuginfo] target(s) in 0.09s
    Analyzing target/thumbv7em-none-eabihf/release/rmk-stm32h7

File  .text    Size Crate
0.5%  17.8% 11.1KiB sequential_storage
0.5%  17.7% 11.1KiB embassy_futures
0.5%  17.2% 10.7KiB embassy_stm32
0.4%  14.8%  9.3KiB embassy_executor
0.4%  13.1%  8.2KiB std
0.2%   8.6%  5.3KiB rmk
0.1%   3.8%  2.4KiB embassy_usb
0.1%   2.5%  1.6KiB [Unknown]
0.0%   1.1%    708B embassy_embedded_hal
0.0%   0.8%    490B defmt_rtt
0.0%   0.6%    408B defmt
0.0%   0.6%    384B panic_probe
0.0%   0.4%    244B static_cell
0.0%   0.3%    174B embassy_sync
0.0%   0.3%    164B byteorder
0.0%   0.1%     84B embassy_time
0.0%   0.1%     68B cortex_m_rt
0.0%   0.1%     62B cortex_m
0.0%   0.1%     34B rand_core
0.0%   0.0%     20B rmk_stm32h7
0.0%   0.1%     54B And 4 more crates. Use -n N to show more.
2.7% 100.0% 62.5KiB .text section size, the file size is 2.2MiB
diondokter commented 4 months ago

I've made my own repo for checking out the size here: https://github.com/diondokter/s-s-size

diondokter commented 4 months ago

Ok, so after some investigation one thing that causes extra flash usage is generics and how I did the caching.

For ease of use I implemented the relevant cache traits for impl<T: Cache> Cache for &mut T as well. This is nice from a usage point, but it makes it very easy to have multiple cache impls in the system which all monomorphise to their own functions and futures.

This means that it's possible to see the same function three times with:

If I force everyone to use &mut impl Cache, then that saves ~2-3kb or 25% of the total binary size of my testing repo. This enforcement is a breaking change though...

If you use multiple types as the Item type, then the same thing happens. Maybe that can be fixed if I change the trait for it, but that's also a breaking change.

HaoboGu commented 4 months ago

I've tested the new branch, the flash size taken by sequential-storage was reduced from 11.4kb to 8.8kb, 23.4%. That's a great improvement. 👍 Is it possible to optimize flash using a little more?

diondokter commented 4 months ago

Using a little more what?

HaoboGu commented 4 months ago

Is it possible to optimize flash using a little more?

-> Is it possible to optimize flash usage a little more?

ah sorry for the typo

diondokter commented 4 months ago

Ah right.

Well... maybe? I'm looking into it. The multiple cache impls was most obvious. I can try make an optimization so that using multiple item types doesn't cost a lot.

Other than that will be difficult. Debugging futures is hard. It's difficult to see the statemachines that the compiler produces and the generated assembly is even harder to read than normal non-async assembly.

HaoboGu commented 4 months ago

Yeah, I agree that debugging async rust is quite hard. And I'm looking forward to the further improvement, thanks for your great work ❤️

diondokter commented 4 months ago

@HaoboGu I think I've got all the size optimizations in the PR that I want to make. To do more I had to change the map API too, but the new API is probably closer to what people expect when they use the crate, so that's a good thing.

Could you try it out?

HaoboGu commented 4 months ago

Sure, will try soon!

HaoboGu commented 4 months ago

Just tried it out. The api is more clear to me than before, and binary size of s-s was reduced by ~3KB, from ~11KB -> ~8KB in my project. That's great!

Thank you @diondokter very much for your work, I really appreciate it.

diondokter commented 4 months ago

Thanks for the feedback!

There's one other issue I want to work on before I release the next version. But for the mean time you can use the master branch