Translator traits take up a lot of space

afilini commented 1 month ago

Related to #585, I have been investigating what caused the recent code size increase and it looks like it is due to a few different changes, one of them being the switch to use from_ast in the translator. For example through bisecting I got it down to this specific commit as one of the offenders: https://github.com/rust-bitcoin/rust-miniscript/commit/fa2e8b41735a2433984d98a0caf23beda57d7ab5

For context, when building my firmware commenting out the content of Descriptor::at_derivation_index() and replacing it with a todo!() reduces the overall size by 160KiB! Doing the same on an older version of miniscript (9.0.2) decreases the size by 60KiB.

at_derivation_index() mainly just uses the translator to map keys, and it happens to be the only instance of the translators in use in my code. That's why commenting it out causes LTO to prune all that code and save a lot of space.

I tried reverting that from_ast() change specifically but unfortunately it didn't make much difference. I'll keep digging but at least we now have a rough idea of where to look...

afilini commented 1 month ago

Changed my bisect scripts slightly, now it also points to: be07a9c9c5283d171ed86b677d601881c7032e89 (maybe the extra generic for the error in translate_pk?)

apoelstra commented 1 month ago

Great find! I'll investigate specifically this API.

As I mentioned to you offline, this trait was originally designed as a "Swiss army knife" trait which I hoped to use to do every possible transformation of a Miniscript, from counting keys to lifting to semantic policies to annotating keys. But the result of this is that basically every single call that uses this trait instantiates a whole new copy of a bunch of code.

I think it would make more sense to split this into multiple traits or even to just use a collection of closures for translation.

apoelstra commented 1 month ago

Looking more closely, some observations:

The Translator trait itself is pretty tight. It has only two generics: a source keytype and a target keytype.
Its use in Descriptor::at_derivation_index is also pretty tight: it defines a newtype around a u32 and implements the Translator trait on it with concrete error types.
The at_derivation_index is also not defined on a generic Descriptor but on a specific Descriptor<DescriptorPublicKey>.

So I'm pretty surprised to see that this particular method is blowing up. Could it be that rustc is making multiple copies of the Derivator newtype and its Translator impl? What if you just move that code outside of the at_derivation_index function so that at_derivation_index itself is a one-liner?

apoelstra commented 1 month ago

Alternately, could the real blowup be that by using Translator you are pulling in crate::Error and all its bloat, whereas you otherwise wouldn't be?

apoelstra commented 1 month ago

Ok, looking even more closely, I think the code blowup comes from the Miniscript::translate_pk function which does a giant match and pulls in all the type-checking code and all its error paths and everything, so even though it's only instantiated once inside at_derivation_index this one time is really big.

I think this would be resolved by using a context object during construction, which would let us have a slab allocator, which would let us have "direct" translation where we just replaced the keys without touching the hashes or any of the structural properties of the script.

apoelstra commented 1 month ago

https://github.com/rust-bitcoin/rust-miniscript/discussions/718

afilini commented 1 month ago

Thank you so much for taking a look!

Alternately, could the real blowup be that by using Translator you are pulling in crate::Error and all its bloat, whereas you otherwise wouldn't be?

Yes! In my "synthetic" test that I used for bisecting (see attached scripts below) it looks like that's where a lot of the increase comes from. In my real firmware unfortunately making that change doesn't have a huge impact.. I'll have to investigate further, it may be that since I'm doing other miniscript operations I already had all the Error code in my binary.

This is the setup I use to bisect:

check.sh

```bash #!/usr/bin/env sh sed -i '/unused_imports/d' ./src/lib.rs V=$(RUSTFLAGS="-C opt-level=z" cargo bloat --example profiler --filter miniscript --full-fn | grep filtered | awk '{ print $3 }' | tr -d 'B' | numfmt --from auto) git reset --hard echo "Total miniscript size = $V" if [ "$V" -gt "40000" ]; then exit 1 fi ```

examples/profiler.rs

```rust fn main() { let desc: &miniscript::Descriptor:: = unsafe { std::mem::transmute(&0) }; // use std::str::FromStr; // let desc: miniscript::Descriptor:: = FromStr::from_str("sh(wsh(or_d(\ // c:pk_k(020e0338c96a8870479f2396c373cc7696ba124e8635d41b0ea581112b67817261),\ // c:pk_k(0250863ad64a87ae8a2fe83c1af1a8403cb53f53e486d8511dad8a04887e5b2352)\ // )))").unwrap(); let d = desc.at_derivation_index(0); unsafe { let _ = std::ptr::read_volatile(&d as *const _); } } ```

afilini commented 1 month ago

Ok I was able to pin down the issue in my firmware, it's the fact that almost all the new translate_pk methods now go through the various constructors (for example Pkh::new()) which internally run all the sanity checks.

Constructing those structs directly in translate_pk shaves off 40KiB and brings it back in line with the older version.

apoelstra commented 1 month ago

Yep, makes sense. But the existing constructor mechanism is there to prevent translations from converting good scripts to bad ones, so we can't fix the codegen issue this way. We need to do somthing smarter.

rust-bitcoin / rust-miniscript

Translator traits take up a lot of space #717