llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.8k stars 11.9k forks source link

[WebAssembly+other platforms?] In need of a Clang extension for compile-time validating compatible casting of function pointers #57024

Open juj opened 2 years ago

juj commented 2 years ago

In the C/C++ standard, it is unspecified behavior to call a function pointer through a signature that does not match the signature of the function pointed to.

In different x86/ARM ABI, platform-specific and calling convention specific relaxations exist, which make it safe to perform certain types of casts while still being compatible and safe for that specific platform and calling convention. E.g. calling a function of type void (int) through a function signature void (int, int) may work, and the second passed argument is safely ignored, or calling a function of type int (int) through a function signature void (int) may work, and the return value from the function is safely ignored.

In WebAssembly however, in comparison to x86/ARM, due to the security requirements, function pointers are treated more strictly, and neither types of relaxations for function pointers casts will work, but a Wasm VM will throw an exception at runtime.

That is, in Wasm, calling a function of type int (void) through a signature void (void) will not work, but will raise a function signature mismatch. Also, calling a function of type void (int) through a signature void (int, int) will not work, but will raise an exception.

However, in Wasm, calling a function void (char) through a signature void (int) does work, as so also does calling a function void(int *) through a signature void(struct foo *).

None of the existing casts (static_cast, dynamic_cast, reinterpret_cast and C cast) accurately capture the "what is safe and works for my target platform" aspects of the function pointer casting.

This raises an interesting portability and future compatibility opportunity. It seems that on all of these target platforms and calling conventions, the rules of what types of casts work and what won't would be well codifiable to a static compile-time check - so maybe Clang front-end could take advantage of this?

I.e. it would be interesting to have a Clang compiler specific extension, something like __fp_cast<target_sig>(myfunc), which would at compile time raise an error if the target compilation platform+calling convention cannot support the specific signature conversion.

This would have the following benefits:

  1. Developers would be able to statically annotate in their codebases where they are intentionally relying on platform specific function pointer conversions, distinguishing from C casts and reinterpret_casts that deal with data. (making such locations searchable)
  2. Developers can be more comfortable in targeting their platform specific functionality, while remaining satisfied that the future portability limitations will be caught at compile time, as opposed to developers having to run into silent/cryptic failures or undefined behavior at runtime if the code is attempted to be compiled against a new target platform in the future
  3. Developers can use the cast to discover/educate themselves about the exact rules that a target specific function pointer cast allows or does not allow. I.e. "does wasm allow mismatching return type in function pointer casts? what about mismatching pointer types? Well, let me quickly check with a __fp_cast utilizing test code)".
  4. When LLVM adopts new target platforms in the future, they can piggyback on implementing their custom rules for supported __fp_casts, helping developers identify portability problems in existing code that used such casts.

One of the most common runtime crashes when porting large application codebases to WebAssembly occurs with signature mismatches on function pointers (that were fine on x86). Debugging this in large codebases is painful, because one has to exercise all code paths on WebAssembly at runtime to validate that everything is safe - and typically the people doing the porting haven't written even 0.001% of the code.

It seems that this problem would be solvable by compile time extension to Clang (+ adopting a convention to a codebase), at least for all target/future platforms that are Clang-based, as the compiler would be able to catch fp casts that won't be supported by the target platform.

When I read the documentation page at https://clang.llvm.org/docs/LanguageExtensions.html it seems that such a feature does not exists from before? Would this kind of cast make sense to add?

If one existed, it would probably be possible to enforce a programming convention to always need to use a __fp_cast in our Unity3D codebase to improve identifying portability problems to Unity's future platforms.

CC @sunfishcode @kripken @tlively @sbc100

sunfishcode commented 2 years ago

My read of the standards is that it's undefined behavior, not unspecified behavior. For example, C17 6.5.2.2: "If the expression that denotes the called function has a type that includes a prototype, the number of arguments shall agree with the number of parameters." and "If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined".

juj commented 2 years ago

Thanks - I have been led astray by previous conversations with people on this matter then.

Btw, do you know why that statement would use the expression is not compatible with, instead of straight up saying is not equal to? What constitutes a compatible type?

llvmbot commented 2 years ago

@llvm/issue-subscribers-backend-webassembly

sunfishcode commented 2 years ago

Compatible types allows for a few narrow differences; for example an array type of unknown length is compatible with an array type with known length, if the element type is compatible.

6.7.6.3 defines compatibility for function types: "For two function types to be compatible, both shall specify compatible return types. Moreover, the parameter type lists, if both are present, shall agree in the number of parameters and in use of the ellipsis terminator; corresponding parameters shall have compatible types."

juj commented 2 years ago

Great, that makes totally sense.

Back the day when porting libraries/projects through Emscripten to run on the web, I recall that FreeType 2, SDL, Harfbuzz, Unity3D and Unreal Engine (3 and 4) were among the projects that had issues with function pointer signatures not matching. With SDL if I recall correctly it was a purely oversight/typo, but in other codebases I recall that it was actually used for a purposeful effect.

I am certainly happy to argue to our devs that it is should not be done on any platform. From prior conversations with engineers, I was declared something along the lines of "yeah it is undefined by the standard, but for X86&ARM platforms specifically it is supported for limited circumstances".

Does anyone know if there actually are precedents in GCC and/or Clang that it has been intentional to enable certain types of function pointer mismatches specifically for some X86 and ARM calling conventions?

Off the top of my head of ancient "we did it by intent" usage, pthread function entry points getting defined with a signature void *pthread_main() {} instead of void *pthread_main(void*) {}. One such example I recall from the Open POSIX Test Suite.

Also, do any Clang Sanitizers catch these kinds of errors?

sunfishcode commented 2 years ago

I wrote an LLVM pass for the WebAssembly backend that fixes up such casts by inserting auto-generated wrapper functions, though it doesn't handle all cases. That's all I know.