microsoft / win32metadata

Tooling to generate metadata for Win32 APIs in the Windows SDK.
Other
1.34k stars 116 forks source link

Cannot specify `cString` in `ScriptStringAnalyse` #1881

Closed wuweiran closed 5 months ago

wuweiran commented 7 months ago

Summary

windows::Win32::Globalization::ScriptStringAnalyse puts the third parameter of its corresponding ffi ScriptStringAnalyse as pidx.as_deref().map_or(0, |slice| slice.len().try_into().unwrap()). This prevents user from setting cString (length of the string to analyze) without specifying piDx which is optional. Please report this for me if it is caused by incorrect Win32 metadata.

Crate manifest

[dependencies.windows]
version = "0.54.0"

Crate code

ScriptStringAnalyse(
    udc,
    context.buffer.as_ptr() as _,
    (1.5 * length as f32 + 16f32) as i32,
    -1,
    SSA_LINK | SSA_FALLBACK | SSA_GLYPHS,
    -1,
    None,
    None,
    None, // cannot specify `cString` with this parameter being `None`
    None,
    null(),
    &mut context.ssa,
)?;
kennykerr commented 7 months ago

The NativeArrayInfo attribute binds the two parameters together. Perhaps the SAL annotation is incorrect, or perhaps its interpretation is incorrect.

I don't know anything about this function so I can't be sure, but I can transfer to the Win32 metadata repo for consideration.

tim-weis commented 7 months ago

This looks like a genuine code-gen bug, where the [Optional] attribute of a parameter is retroactively applied to a non-optional parameter referenced in SAL annotations.

This is the (abridged) C definition:

HRESULT WINAPI ScriptStringAnalyse(
    // ...
    int                               cString,    //In  Length in characters (Must be at least 1)
    // ...
    _In_reads_opt_(cString) const int *piDx,      //In  Requested logical dx array
    // ...
);

The corresponding metadata accurately captures the signature information:

HRESULT ScriptStringAnalyse(
    // ...
    [In] int cString,
    // ...
    [Optional][In][Const][NativeArrayInfo(CountParamIndex = 2)] int* piDx,
    // ...
);

The generator uses this data in preparation for the "ptr + length" -> "slice" transformation but fails to acknowledge that while the ptr is optional, the length is not.

The [NativeArrayInfo] attribute is initially evaluated here:

https://github.com/microsoft/windows-rs/blob/994dc7519fcb3ece2035f7fc4607db9ca6a8047c/crates/libs/bindgen/src/metadata.rs#L321-L328

While I understand that the current Rust signature for ScriptStringAnalyse() leaves the API partially unusable, I'm struggling to come up with a workable solution to address this issue. At the very least, the code generator needs to inhibit the "ptr + length" -> "slice" transformation if either parameter is [Optional].

This could be controlled from metadata with the [PreserveSig = true] attribute with effects on all downstream clients (including this repo). The alternative would be to deal with this in the code generator. I don't know how complex that would be, or even just which of the four [Optional] combinations are "transformation safe".

kennykerr commented 7 months ago

I think some of the confusion can be found here:

_Check_return_ HRESULT WINAPI ScriptStringAnalyse(
    HDC                                             hdc,        //In  Device context (required)
    const void                                      *pString,   //In  String in 8 or 16 bit characters
    int                                             cString,    //In  Length in characters (Must be at least 1)
    int                                             cGlyphs,    //In  Required glyph buffer size (default cString*1.5 + 16)
    int                                             iCharset,   //In  Charset if an ANSI string, -1 for a Unicode string
    DWORD                                           dwFlags,    //In  Analysis required
    int                                             iReqWidth,  //In  Required width for fit and/or clip
    _In_reads_opt_(1) SCRIPT_CONTROL               *psControl, //In  Analysis control (optional)
    _In_reads_opt_(1) SCRIPT_STATE                 *psState,   //In  Analysis initial state (optional)
    _In_reads_opt_(cString) const int              *piDx,      //In  Requested logical dx array
    _In_reads_opt_(1) SCRIPT_TABDEF                *pTabdef,   //In  Tab positions (optional)
    const BYTE                                      *pbInClass, //In  Legacy GetCharacterPlacement character classifications (deprecated)
    _Outptr_result_buffer_(1) SCRIPT_STRING_ANALYSIS    *pssa);     //Out Analysis of string

There appears to be a relationship between pString and cString in the comments, but SAL only indicates a relationship between cString and piDx, so that's the one that windows-bindgen (the Rust code generator) honors since it doesn't read comments...

There are other APIs where multiple parameters point to the same length parameter and in such cases windows-bindgen will reject the relationship and thus omit the transformation.

tim-weis commented 6 months ago

Thanks for the thorough analysis, Kenny. That makes sense.

The actual fix would then be to update the SAL annotations for ScriptStringAnalyse(), and windows-bindgen will do the right thing as the changes eventually trickle down.

Specifically, pString needs to be associated with cString. This is somewhat complicated as cString is measured in characters, but pString can be either a narrow or wide character string (depending on the value of iCharset). Plus, pString is typed as a void*, so _In_reads_(s) does nothing as there is no type information to scale s.

I played around with the annotations a bit longer and came up with this:

_In_reads_bytes_(((iCharset == -1) + 1) * cString) const void *pString
//               ^^^^^^^^^^^^^^^^^^^^^^ 2 if iCharset == -1 (Unicode)
//                                      1 otherwise (ANSI)

This reliably reports pre-condition violations during code analysis of a C program. While (presumably) correct, I don't know how well ClangSharp or the metadata artifacts can handle complex SAL annotations.

riverar commented 6 months ago

Metadata has no SAL support to speak of, so we'd be manually adding any attributes needed here.

mikebattista commented 5 months ago

Can you share specifically what attribute we should add to get the desired behavior? Or an example of another API that declares a similar relationship properly already that I should copy?

wuweiran commented 5 months ago

@mikebattista As user, I think just removing the relationship between piDx and cString should do. @riverar @kennykerr could you answer Mike's questions? Thanks.

riverar commented 5 months ago

piDx is tied to the number of logical characters (cString) passed in. The problem seems to lie, as @tim-weis originally pointed out, in the Rust code gen not accounting for this scenario (required length, optional input with NativeArrayInfo pointing back).

I think pString and its quirkiness is a distraction and not relevant to this issue.

Don't think there's any metadata work to be done here.

tim-weis commented 5 months ago

The problem seems to lie, as @tim-weis originally pointed out, in the Rust code gen not accounting for this scenario (required length, optional input with NativeArrayInfo pointing back).

@riverar I don't think this is correct. The core issue is ultimately down to the fact that the SDK provides partial annotations, failing to describe the relationship between pString and cString. The net result is that the metadata winds up with partial constraints that allow for a situation that cannot happen: A mandatory cString parameter whose sole purpose is to describe the size of the optional piDx parameter.

Had the metadata captured the full set of relations (specifically the fact that cString also describes the size of pString) the Rust code gen would already be prepared to handle this situation.

Don't think there's any metadata work to be done here.

I will object. As I understand the issue it can only be resolved from the metadata side of the fence. The metadata either must provide the full set of relationships between parameters, or none at all. Since the cString <-> pString relationship is too complex to encode, I recommend special-casing this API to drop the cString <-> piDx relationship (NativeArrayInfo) altogether.

kennykerr commented 5 months ago

Tim is right. All the relationships must be defined, or none of them. I prefer none in this case since multiple relationships don't work in practice. But the current metadata is broken.

riverar commented 5 months ago

I think it's important to remember that SAL annotations are completely optional. The API author has the discretion to decide which relationships to express and test.

I concur that the metadata ends up with a constraint permutation where a mandatory cString parameter is used to describe the size of an optional piDx parameter. However, I don’t agree that this is a problem with the metadata.

SAL annotations can be implemented to varying degrees of completion, and it feels unfair to hold metadata to a standard that requires SAL annotations to be all-encompassing. (Metadata can't even represent all SAL annotated relationships, so should always be assumed to be in a partial state.)

The cString parameter also describes the length of piDx, which is currently modeled in the metadata. This means there is a complete and valid relationship modeled: when piDx is provided, it must be of cString length.

In theory, an alternative language code generator could generate an API wrapper with asserts, instead of attempting to transform the signature, and make use of this metadata as-is with no problem.

That said, I’m more than happy to hold the dissenting opinion here and remove the metadata as requested.

kennykerr commented 5 months ago

I don't disagree that SAL is incomplete and optional. The trouble is that the metadata attempts to be more than SAL. If languages are to alter code generation, not merely check for SAL violations at call sites, then the metadata must be authoritative and complete to be useful.

tim-weis commented 5 months ago

SAL annotations can be implemented to varying degrees of completion

Correct. When used for what they were intended for: Static code analysis.

It feels unfair to hold metadata to a standard that requires SAL annotations to be all-encompassing.

They must only be all-encompassing when re-purposed to inform code generation. Which is what this repo's artifacts are. In this case, the annotations aren't sufficiently complete, and completing them wouldn't be representable (at this time).

The only reasonable choice thus is to drop them all.

riverar commented 5 months ago

[...] the annotations aren't sufficiently complete [...]

For Rust, and only Rust. 😛 (Not accurate, C# is also affected.)

I'll get a PR out in a moment to unblock Rust.