rustls / rustls

A modern TLS library in Rust
Other
5.87k stars 628 forks source link

Avoid code bloat due to unnecessary monomorphization #915

Open briansmith opened 2 years ago

briansmith commented 2 years ago

I noticed the following code:

impl<Data> ConnectionCommon<Data> {
}

The thing that strikes me about this code block is that it is currently 762 - 411 = 351 lines of code all parameterized on the concrete type of Data, so for each type Data that it is instantiated with, those lines of code will effectively be duplicated. Looking at the code, it seems like only small portions of the code actually require monomorphizing on Data. It seems like we could refactor the code to avoid the majority of the bloat. I don't have concrete measurements on how much this matters, but it seems like more and more PRs are adding more and more monomorphized code, so the trend seems to be to get worse in this respect. I think that would be a mistake.

Concretely, I suggest we try to avoid large amounts of code in type parameterized blocks like this, and instead prefer to optimize for using &dyn Trait to avoid the bloat.

For example, I would say this is generally suboptimal:

trait Foo {
     const SOMETHING: u8;
}

and instead we should do:

trait Foo {
     fn something(&self) -> u8;
}

The latter prevents &dyn Foo whereas the latter admits its use.

I think it would be good for us to get some consensus on this because otherwise it's going to be more difficult and more churn-inducing to undo the bloat later.

djc commented 2 years ago

Yes, I tend to write monomorphizing code by default, since object safety can be kind of pain (and often requires extra boxing) and routing simple constants through method calls also feels suboptimal. For most applications I use, code size is not really an issue, but it makes sense to keep this in mind.

(I think I tend to assume that an impl<Data> block would not duplicate the methods that don't touch any Data-related data, but that could easily be wrong.)

briansmith commented 2 years ago

(I think I tend to assume that an impl<Data> block would not duplicate the methods that don't touch any Data-related data, but that could easily be wrong.)

I think we should verify it one way or another. My belief is that the Rust compiler will monomorphize everything within the block parameterized by the types, regardless of whether it uses the types.

I think it would be good for us to use code patterns that ensure that the code size remains small, or at least that we're not wastefully bloating things. It would be good for us to have metrics for code bloat. Perhaps we should have a CI/CD job that builds a stripped optimized tlsclient/tlsserver and measures the size and allows us to see the object size over time. Though, for that, it would probably make more sense to have a combined "rustls" executable that can do the job of both tlsclient and tlsserver so we can measure when we're sharing and un-sharing code between the client and the server.