Closed HaoboGu closed 4 months ago
Hmmm I've got to admit that I haven't watched binary size very carefully lately. Most of these seem pretty reasonable, especially when (parts of) the flash driver is likely inlined too.
What I think is strange is those multiple store_item_inner
rows with a closure. The strangest part that that function doesn't have any closures in it...
Could you run this again with RUSTFLAGS="-C inline-threshold=0"
and lto = false
?
This will stop the inlining and give a fairer view of the situation.
yeah.
store_item_inner
s are still there.
my cargo profile config:
[profile.release]
codegen-units = 1 # better optimizations
debug = true # no overhead for bare-metal
opt-level = "z" # optimize for binary size
overflow-checks = false
lto = false
and rust flags in .cargo/config:
rustflags = [
"-C", "inline-threshold=0"
]
Thanks for checking! Strange how the numbers don't change much...
I'll try to look into it when I have time
I tried to unify all stored structs into one enum, which could reduce flash usages to about 11KB, still quite large.
I attached all the cargo bloat --release
info, hope that would be helpful
$ cargo bloat --release -n 100
Finished release [optimized + debuginfo] target(s) in 0.05s
Analyzing target/thumbv7em-none-eabihf/release/rmk-stm32h7
File .text Size Crate Name
0.5% 17.7% 11.1KiB embassy_futures? <embassy_futures::select::Select4<A,B,C,D> as core::future::future::Future>::poll
0.4% 13.5% 8.4KiB embassy_executor embassy_executor::raw::TaskStorage<F>::poll
0.2% 7.5% 4.7KiB embassy_stm32 embassy_stm32::rcc::_version::init_pll
0.2% 6.2% 3.9KiB sequential_storage sequential_storage::map::store_item_inner::{{closure}}
0.1% 4.5% 2.8KiB rmk <rmk::keycode::KeyCode as num_enum::FromPrimitive>::from_primitive
0.1% 3.3% 2.1KiB sequential_storage sequential_storage::map::try_repair::{{closure}}
0.1% 2.7% 1.7KiB std core::fmt::Formatter::pad
0.1% 2.2% 1.4KiB embassy_stm32? <embassy_stm32::usb_otg::usb::Bus<T> as embassy_usb_driver::Bus>::poll::{{closure}}
0.0% 1.6% 1012B sequential_storage sequential_storage::map::fetch_item_with_location::{{closure}}
0.0% 1.3% 830B std compiler_builtins::mem::memmove
0.0% 1.2% 774B embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Plldiv> for embassy_stm32::time::Hertz>::div
0.0% 1.1% 692B [Unknown] OTG_HS
0.0% 1.0% 620B embassy_usb embassy_usb::class::hid::build
0.0% 0.9% 592B embassy_executor embassy_executor::arch::thread::Executor::run
0.0% 0.9% 548B embassy_stm32? <embassy_stm32::usb_otg::usb::Endpoint<T,embassy_stm32::usb_otg::usb::In> as embassy_usb_driver::EndpointIn>::write::{{closure}}
0.0% 0.8% 540B std <core::fmt::builders::PadAdapter as core::fmt::Write>::write_str
0.0% 0.8% 520B rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::process_action_keycode
0.0% 0.8% 508B sequential_storage sequential_storage::map::fetch_item::{{closure}}
0.0% 0.7% 460B sequential_storage sequential_storage::item::ItemIter::next::{{closure}}
0.0% 0.6% 414B std core::fmt::Formatter::pad_integral
0.0% 0.6% 384B embassy_usb embassy_usb::Inner<D>::handle_bus_event::{{closure}}
0.0% 0.6% 380B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::NorFlash>::write::{{closure}}
0.0% 0.6% 370B std core::fmt::write
0.0% 0.6% 356B embassy_stm32 embassy_stm32::dma::dma::on_irq_inner
0.0% 0.5% 340B sequential_storage sequential_storage::item::ItemHeader::read_new::{{closure}}
0.0% 0.5% 336B sequential_storage sequential_storage::map::store_item::{{closure}}
0.0% 0.5% 328B std compiler_builtins::mem::memcpy
0.0% 0.5% 328B rmk? <rmk::storage::StorageData<_,_,_> as sequential_storage::map::StorageItem>::deserialize_from
0.0% 0.5% 328B rmk <rmk::hid::UsbHidWriter<D,_> as rmk::hid::HidWriterWrapper>::write::{{closure}}
0.0% 0.5% 324B sequential_storage sequential_storage::item::Item::write_raw::{{closure}}
0.0% 0.5% 300B embassy_stm32? <embassy_stm32::usb_otg::usb::Endpoint<T,embassy_stm32::usb_otg::usb::Out> as embassy_usb_driver::EndpointOut>::read::{{closure}}
0.0% 0.5% 296B embassy_usb <embassy_usb::class::hid::Control as embassy_usb::Handler>::control_in
0.0% 0.5% 296B sequential_storage sequential_storage::item::ItemHeader::read_item::{{closure}}
0.0% 0.4% 270B rmk rmk::via::keycode_convert::from_via_keycode
0.0% 0.4% 256B sequential_storage sequential_storage::partial_close_page::{{closure}}
0.0% 0.4% 252B sequential_storage sequential_storage::get_page_state::{{closure}}
0.0% 0.4% 252B sequential_storage sequential_storage::get_page_state::{{closure}}
0.0% 0.4% 252B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::NorFlash>::erase::{{closure}}
0.0% 0.4% 244B std core::fmt::num::imp::<impl core::fmt::Display for u32>::fmt
0.0% 0.4% 232B panic_probe <&T as core::fmt::Display>::fmt
0.0% 0.4% 232B embassy_stm32? <embassy_stm32::usb_otg::usb::Bus<T> as embassy_usb_driver::Bus>::endpoint_set_enabled
0.0% 0.4% 232B std <&T as core::fmt::Debug>::fmt
0.0% 0.4% 232B embassy_stm32 embassy_stm32::usb_otg::usb::Driver<T>::alloc_endpoint
0.0% 0.3% 224B rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::serialize_and_send_composite_report::{{closure}}
0.0% 0.3% 220B std <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::try_fold::flatten::{{closure}}
0.0% 0.3% 218B std core::fmt::Formatter::debug_tuple_field1_finish
0.0% 0.3% 212B sequential_storage sequential_storage::find_first_page::{{closure}}
0.0% 0.3% 212B sequential_storage sequential_storage::find_first_page::{{closure}}
0.0% 0.3% 208B embassy_stm32 _embassy_time_set_alarm
0.0% 0.3% 200B rmk rmk::via::keycode_convert::to_via_keycode
0.0% 0.3% 200B std core::fmt::num::<impl core::fmt::UpperHex for i32>::fmt
0.0% 0.3% 192B embassy_usb embassy_usb::descriptor::BosWriter::capability
0.0% 0.3% 182B std core::panicking::assert_failed_inner
0.0% 0.3% 180B embassy_stm32 embassy_stm32::dma::bdma::on_irq_inner
0.0% 0.3% 172B embassy_stm32 embassy_stm32::usb_otg::usb::Driver<T>::alloc_endpoint
0.0% 0.3% 166B std compiler_builtins::arm::__aeabi_memcpy4
0.0% 0.3% 164B embassy_stm32 embassy_stm32::flash::common::get_sector
0.0% 0.3% 162B embassy_usb <embassy_usb::class::hid::Control as embassy_usb::Handler>::control_out
0.0% 0.2% 160B defmt_rtt _defmt_write
0.0% 0.2% 156B std compiler_builtins::mem::memset
0.0% 0.2% 156B sequential_storage sequential_storage::item::ItemHeader::write::{{closure}}
0.0% 0.2% 154B rmk? <rmk::action::KeyAction as core::cmp::PartialEq>::eq
0.0% 0.2% 148B rmk rmk::keyboard::Keyboard<In,Out,_,_,_,_>::process_key_action_normal
0.0% 0.2% 140B embassy_usb embassy_usb::descriptor::DescriptorWriter::write
0.0% 0.2% 138B defmt core::fmt::Write::write_char
0.0% 0.2% 136B std compiler_builtins::arm::__aeabi_memset4
0.0% 0.2% 128B panic_probe rust_begin_unwind
0.0% 0.2% 124B embassy_stm32? <embassy_stm32::usb_otg::usb::ControlPipe<T> as embassy_usb_driver::ControlPipe>::reject::{{closure}}
0.0% 0.2% 116B embassy_stm32 embassy_stm32::time_driver::RtcDriver::next_period
0.0% 0.2% 114B embassy_sync embassy_sync::waitqueue::atomic_waker::AtomicWaker::register
0.0% 0.2% 112B embassy_stm32 TIM2
0.0% 0.2% 112B embassy_executor _embassy_time_schedule_wake
0.0% 0.2% 112B sequential_storage sequential_storage::open_page::{{closure}}
0.0% 0.2% 108B embassy_usb embassy_usb::descriptor::DescriptorWriter::endpoint
0.0% 0.2% 104B embassy_usb <embassy_usb::descriptor_reader::DescriptorIter as core::iter::traits::iterator::Iterator>::next
0.0% 0.2% 104B embassy_stm32 embassy_stm32::flash::family::blocking_wait_ready
0.0% 0.2% 104B rmk? <rmk::keycode::ModifierCombination as core::cmp::PartialEq>::eq
0.0% 0.2% 102B sequential_storage sequential_storage::map::StorageItem::deserialize_key_only
0.0% 0.2% 100B sequential_storage sequential_storage::item::adapted_crc32
0.0% 0.1% 94B embassy_executor embassy_executor::raw::waker::wake
0.0% 0.1% 92B defmt_rtt defmt_rtt::channel::Channel::write_impl
0.0% 0.1% 92B std <core::iter::adapters::flatten::FlattenCompat<I,U> as core::iter::traits::iterator::Iterator>::try_fold::flatten::{{closure}}
0.0% 0.1% 92B embassy_stm32? <embassy_stm32::usb_otg::usb::ControlPipe<T> as embassy_usb_driver::ControlPipe>::accept::{{closure}}
0.0% 0.1% 88B embassy_stm32 embassy_stm32::_generated::<impl core::ops::arith::Div<stm32_metapac::rcc::vals::Hpre> for embassy_stm32::time::Hertz>::div
0.0% 0.1% 88B defmt_rtt _defmt_acquire
0.0% 0.1% 88B std core::ops::function::FnMut::call_mut
0.0% 0.1% 88B defmt defmt::export::fmt
0.0% 0.1% 84B static_cell static_cell::StaticCell<T>::init
0.0% 0.1% 80B embassy_stm32 embassy_stm32::rcc::_version::apb_div_tim
0.0% 0.1% 80B embassy_stm32 embassy_stm32::usb_otg::usb::ep0_mpsiz
0.0% 0.1% 80B sequential_storage sequential_storage::item::MaybeItem::unwrap
0.0% 0.1% 80B static_cell static_cell::StaticCell<T>::init
0.0% 0.1% 80B static_cell static_cell::StaticCell<T>::init
0.0% 0.1% 76B embassy_usb embassy_usb::msos::MsOsDescriptorWriter::end_subset
0.0% 0.1% 76B defmt_rtt _defmt_release
0.0% 0.1% 76B std core::result::unwrap_failed
0.0% 0.1% 76B rmk rmk::matrix::Matrix<In,Out,_,_>::get_key_state
0.0% 0.1% 76B embassy_embedded_hal? <embassy_embedded_hal::adapter::blocking_async::BlockingAsync<T> as embedded_storage_async::nor_flash::ReadNorFlash>::read::{{closure}}
0.0% 0.1% 74B std <core::fmt::builders::PadAdapter as core::fmt::Write>::write_char
0.0% 0.1% 72B embassy_usb embassy_usb::Inner<D>::handle_control_in_delegated
0.2% 8.5% 5.3KiB And 259 smaller methods. Use -n N to show more.
2.7% 100.0% 62.5KiB .text section size, the file size is 2.2MiB
Crate statistics:
$ cargo bloat --release --crates
Finished release [optimized + debuginfo] target(s) in 0.09s
Analyzing target/thumbv7em-none-eabihf/release/rmk-stm32h7
File .text Size Crate
0.5% 17.8% 11.1KiB sequential_storage
0.5% 17.7% 11.1KiB embassy_futures
0.5% 17.2% 10.7KiB embassy_stm32
0.4% 14.8% 9.3KiB embassy_executor
0.4% 13.1% 8.2KiB std
0.2% 8.6% 5.3KiB rmk
0.1% 3.8% 2.4KiB embassy_usb
0.1% 2.5% 1.6KiB [Unknown]
0.0% 1.1% 708B embassy_embedded_hal
0.0% 0.8% 490B defmt_rtt
0.0% 0.6% 408B defmt
0.0% 0.6% 384B panic_probe
0.0% 0.4% 244B static_cell
0.0% 0.3% 174B embassy_sync
0.0% 0.3% 164B byteorder
0.0% 0.1% 84B embassy_time
0.0% 0.1% 68B cortex_m_rt
0.0% 0.1% 62B cortex_m
0.0% 0.1% 34B rand_core
0.0% 0.0% 20B rmk_stm32h7
0.0% 0.1% 54B And 4 more crates. Use -n N to show more.
2.7% 100.0% 62.5KiB .text section size, the file size is 2.2MiB
I've made my own repo for checking out the size here: https://github.com/diondokter/s-s-size
Ok, so after some investigation one thing that causes extra flash usage is generics and how I did the caching.
For ease of use I implemented the relevant cache traits for impl<T: Cache> Cache for &mut T
as well. This is nice from a usage point, but it makes it very easy to have multiple cache impls in the system which all monomorphise to their own functions and futures.
This means that it's possible to see the same function three times with:
NoCache
&mut NoCache
&mut &mut NoCache
If I force everyone to use &mut impl Cache
, then that saves ~2-3kb or 25% of the total binary size of my testing repo.
This enforcement is a breaking change though...
If you use multiple types as the Item type, then the same thing happens. Maybe that can be fixed if I change the trait for it, but that's also a breaking change.
I've tested the new branch, the flash size taken by sequential-storage
was reduced from 11.4kb to 8.8kb, 23.4%. That's a great improvement. 👍 Is it possible to optimize flash using a little more?
Using a little more what?
Is it possible to optimize flash using a little more?
-> Is it possible to optimize flash usage a little more?
ah sorry for the typo
Ah right.
Well... maybe? I'm looking into it. The multiple cache impls was most obvious. I can try make an optimization so that using multiple item types doesn't cost a lot.
Other than that will be difficult. Debugging futures is hard. It's difficult to see the statemachines that the compiler produces and the generated assembly is even harder to read than normal non-async assembly.
Yeah, I agree that debugging async rust is quite hard. And I'm looking forward to the further improvement, thanks for your great work ❤️
@HaoboGu I think I've got all the size optimizations in the PR that I want to make. To do more I had to change the map API too, but the new API is probably closer to what people expect when they use the crate, so that's a good thing.
Could you try it out?
Sure, will try soon!
Just tried it out. The api is more clear to me than before, and binary size of s-s was reduced by ~3KB, from ~11KB -> ~8KB in my project. That's great!
Thank you @diondokter very much for your work, I really appreciate it.
Thanks for the feedback!
There's one other issue I want to work on before I release the next version. But for the mean time you can use the master branch
Hi there, I'm using sequential-storage for more usages in my project, I found that
sequential-storage
uses a lot of flash space(about 15KB):I implemented
StorageItem
for only 4 structs. Is there any way to reduce flash usage? thanks!