microsoft / windows-rs

Rust for Windows
https://kennykerr.ca/rust-getting-started/
Apache License 2.0
10.1k stars 473 forks source link

Allow `Error` and `Result<()>` to be the same size as `HRESULT` #3126

Closed sivadeilra closed 1 week ago

sivadeilra commented 1 week ago

Many codebases use HRESULT (whether they use COM or not) and yet never use IErrorInfo. For those codebases, the size of the Result<()> object can be a substantial cost. Currently on 64-bit platforms, size_of<Result<()>> is 16 bytes, when ideally it would be basically a smartly-typed version of HRESULT. For example, Direct3D, DirectWrite, etc. never use IErrorInfo (they don't set any error info objects on failures), so all of the code that uses these codebases will suffer in performance.

This PR allows you to statically disable the error info within Error, by enabling the slim-errors feature in the windows-result crate. (The windows and windows-core crates will also propagate the feature to windows-result, so that you don't need to add an extra crate dependency.)

The performance costs come from the larger code and larger stack area needed. There is also a significant penalty because IErrorInfo requires that the Error object have a Drop implementation. This requires the compiler to precisely track the lifetime of Error values and even of the Result<()> value. This adds a lot of extra code that almost never runs, to many function bodies.

This PR also adds a new NonZeroHRESULT, which is equivalent to HRESULT except that it contains a NonZeroI32, not an i32. This means that it cannot represent S_OK, which is fine. This gives the Rust compiler a "niche", which is a byte pattern that cannot be represented by one variant of an enum, which frees up that byte pattern for a different variant or for enum's "tag". In our case, the compiler will generate type layouts for Option<NonZeroHRESULT> and Result<(), NonZeroHRESULT> that are the same size and alignment as the underlying HRESULT. This is great because it allows them to be stored in a single register, and in many situations the compiler can totally avoid type layout conversions and can produce much better code.

The Error type now uses NonZeroHRESULT. When the slim-errors feature is enabled, size_of::<Error>() == 4 and size_of::<Result<()>>() == 4. Before this change, size_of::<Error>() == 16 and size_of::<Result<()>>() == 24.

This PR improves these types even when not using slim-errors. After this PR, with slim-errors disabled, size_of::<Result<()>>() == 16, 8 bytes less, because of the niche optimization relating to NonZeroHRESULT.

riverar commented 1 week ago

I'm not very familiar in this area. Is that a guaranteed and/or documented compiler niche optimization? Would be awkward if this changed and resulted in net zero or negative savings.

sivadeilra commented 1 week ago

I'm not very familiar in this area. Is that a guaranteed and/or documented compiler niche optimization? Would be awkward if this changed and resulted in net zero or negative savings.

It's a guaranteed optimization. It's the reason that the NonZeroI32 type (and similar) were added to the language.

There are static assertions in the PR that verify this. If anything changed, we would see a build break. But these optimizations are guaranteed, so these assertions are more about documentation than anything else.

riverar commented 1 week ago

Is there any benchmark data that demonstrates this is more than just a theoretical optimization? Not trying to be a blocker, just trying to prevent unnecessary downstream churn.

sivadeilra commented 1 week ago

Is there any benchmark data that demonstrates this is more than just a theoretical optimization? Not trying to be a blocker, just trying to prevent unnecessary downstream churn.

Yes. This change alone removes 1.2% of the entire binary size for DWriteCore. The code is smaller, the exception handling tables are smaller. Text layout benchmarks show meaningful increases in speed as well, which makes sense because these components make extensive use of COM, including purely inside DWriteCore, not just between DWriteCore and external apps.

Keep in mind that DWriteCore is not yet pure Rust; it's about 65% Rust. So that number will go even higher as I convert more of DWriteCore to Rust.

Or, to put it a different way, I see this much regression when I move from com-rs to windows-rs. 1.2% increase in binary size, for a feature that provides absolutely no benefit to DWriteCore or Office, is not something that the Office team is willing to accept. They have pretty strict size gates, especially because they ship on mobile platforms (where we also ship DWriteCore).

kennykerr commented 1 week ago

The size optimization looks neat. As someone who spent a great deal of effort optimizing code gen for call sites in C++, I can see how this would make a meaningful improvement. Mainly I'm concerned about the following:

riverar commented 1 week ago

Something seems wrong with the Error::code natvis, is this expected?

let error_from_win32 = windows_result::Error::from_win32();
0:000> dx -r1 error_from_win32
error_from_win32                 [Type: windows_result::error::Error]
    [+0x008] code             : 1398755147 [Type: core::num::nonzero::NonZero<i32>]
    [+0x000] info             [Type: windows_result::error::error_info::ErrorInfo]
sivadeilra commented 1 week ago

Something seems wrong with the Error::code, is this expected?

Hmmm. I may have to re-introduce the NonZeroHRESULT type, so that NatVis has something to activate it...

It's strange, though, because the NatVis tests pass locally.

riverar commented 1 week ago

I wouldn't spend too much time on it. I've spent a heavy amount of time with natvis and due to the split in expression engines (Visual Studio / windbg), it's very difficult to get right. Adding VS expression engine tests to start chipping away at this problem is on my TODO.