llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.12k stars 12.01k forks source link

[crash] llvm-cxxfilt assumes that the host's representation for floating point types matches the target's #92081

Open bd1976bris opened 6 months ago

bd1976bris commented 6 months ago

Preamble: Consider the following code:

namespace cxx20 {
  template<auto> struct A {};
  void f(A<1.0l>) {}
};

For function f in the above, Clang ,l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:33.615654364185666,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang_trunk,filters:(b:'0',binary:'0',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'1',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-std%3Dc%2B%2B20',overrides:!(),selection:(endColumn:44,endLineNumber:1,positionColumn:44,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(trunk)+(Editor+%231)',t:'0')),header:(),k:53.78470708097413,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:tool,i:(args:'-g',argsPanelShown:'0',compilerName:'x86-64+clang+18.1.0',editorid:1,fontScale:14,fontUsePx:'0',j:1,monacoEditorHasBeenAutoOpened:'1',monacoEditorOpen:'1',monacoStdin:'1',stdin:'',stdinPanelShown:'1',toolId:readelf,treeid:0,wrap:'1'),l:'5',n:'0',o:'readelf+(trunk)+x86-64+clang+(trunk)+(Editor+%231,+Compiler+%231)',t:'0')),k:12.599638554840212,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:58.492063492063494,n:'0',o:'',t:'0'),(g:!((h:output,i:(compilerName:'x86-64+gcc+(trunk)',editorid:1,fontScale:13,fontUsePx:'0',j:1,wrap:'0'),l:'5',n:'0',o:'Output+of+x86-64+clang+(trunk)+(Compiler+%231)',t:'0')),header:(),l:'4',m:41.507936507936506,n:'0',o:'',s:0,t:'0')),l:'3',n:'0',o:'',t:'0')),version:4) mangles as _ZN5cxx201fENS_1AILe3fff8000000000000000EEE (note that GCC ,l:'5',n:'1',o:'C%2B%2B+source+%231',t:'0')),k:33.615654364185666,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:gsnapshot,filters:(b:'0',binary:'0',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'1',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-std%3Dc%2B%2B20',overrides:!(),selection:(endColumn:56,endLineNumber:1,positionColumn:56,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+gcc+(trunk)+(Editor+%231)',t:'0')),header:(),k:53.78470708097413,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:tool,i:(args:'-g',argsPanelShown:'0',compilerName:'x86-64+clang+18.1.0',editorid:1,fontScale:14,fontUsePx:'0',j:1,monacoEditorHasBeenAutoOpened:'1',monacoEditorOpen:'1',monacoStdin:'1',stdin:'',stdinPanelShown:'1',toolId:readelf,treeid:0,wrap:'1'),l:'5',n:'0',o:'readelf+(trunk)+x86-64+gcc+(trunk)+(Editor+%231,+Compiler+%231)',t:'0')),k:12.599638554840212,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:58.492063492063494,n:'0',o:'',t:'0'),(g:!((h:output,i:(compilerName:'x86-64+gcc+(trunk)',editorid:1,fontScale:13,fontUsePx:'0',j:1,wrap:'0'),l:'5',n:'0',o:'Output+of+x86-64+gcc+(trunk)+(Compiler+%231)',t:'0')),header:(),l:'4',m:41.507936507936506,n:'0',o:'',s:0,t:'0')),l:'3',n:'0',o:'',t:'0')),version:4) mangles as _ZN5cxx201fENS_1AILe0000000000003fff8000000000000000EEE).

The 1AILe3fff8000000000000000E part (for A < 1.0l >) is mangled as:

1AI      L          e             3fff8000000000000000    E
¦        ¦          ¦             ¦                       ¦
name     literal    long double   hexadecimal string      literal-bookend

Where the hexadecimal string is the in memory bytes on the target. Quoting from the Itanium-ABI:

Floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example: "Lf bf800000 E" is -1.0f on platforms conforming to IEEE 754.

Clang uses 20 hex characters to encode a long double on most Itanium-ABI targets including PS5 (long double is implemented as 80-bit extended precision).

Problem: With host = windows (long double is an alias for double) and target = PS5 (long double is 80-bit extended precision) ASAN reports a stack-buffer-overflow when running llvm-cxxfilt.exe _ZN5cxx201fENS_1AILe3fff8000000000000000EEE.

This occurs because the demangler code assumes that the representation of a floating point number on the target matches the representation on the host. See: https://github.com/llvm/llvm-project/blob/023cdfcc1a5bdef7f12bb6da9328f93b477c38b8/llvm/include/llvm/Demangle/ItaniumDemangle.h#L2558 However, Visual Studio on the windows host implements long double as synonym for double. Therefore, there isn't enough space to unpack into and the implementation overflows the 8 bytes for a long double and triggers the ASAN fault. Without ASAN, the number is decoded incorrectly. Similar problems will affect other cross-compiler demangling scenarios where there is a difference in the floating point representation between the target and host.

Ideas for fixes: We could simply print the hexadecimal string from the mangled name, this appears to be what GNU implements: GNU cxxfilt demangles _ZN5cxx201fENS_1AILe3fff8000000000000000EEE as cxx20::f(cxx20::A<(long double)[3fff8000000000000000]>). If we just printed the mangled hexadecimal string then that would also remove the non-functional differences between the Windows and Linux output with cxxfilt for floating point literals, due to snprintf differences on different platforms.

We could use a target/host agnostic floating point decoder e.g. ADT/APFloat - which could make some reasonable assumptions e.g. IEEE 754 representation. We might also provide a way of specifying the target for llvm-cxxfilt.

llvmbot commented 6 months ago

@llvm/issue-subscribers-tools-llvm-cxxfilt

Author: bd1976bris (bd1976bris)

**Preamble:** Consider the following code: ``` namespace cxx20 { template<auto> struct A {}; void f(A<1.0l>) {} }; ``` For the above Clang mangles `cxx20::f(cxx20::A<0x8p-3L>)` mangles as `_ZN5cxx201fENS_1AILe3fff8000000000000000EEE`. Note that GCC mangles as `_ZN5cxx201fENS_1AILe0000000000003fff8000000000000000EEE` - the leading zeros are apparently a benign difference. The `1AILe3fff8000000000000000E` part (for `A < 1.0l >`) is mangled as: ``` 1AI L e 3fff8000000000000000 E ¦ ¦ ¦ ¦ ¦ name literal long double hexadecimal string literal-bookend ``` Where the hexadecimal string is the in memory bytes on the target. Quoting from the Itanium-ABI: > Floating-point literals are encoded using a fixed-length lowercase hexadecimal string corresponding to the internal representation, high-order bytes first. For example: "Lf bf800000 E" is -1.0f on platforms conforming to IEEE 754. Clang uses 20 hex characters to encode a long double on most Itanium-ABI targets including PS5 (long double is implemented as 80-bit extended precision). **Problem:** With host = windows (long double is an alias for double) and target = PS5 (long double is 80-bit extended precision) ASAN reports a stack-buffer-overflow when running `llvm-cxxfilt.exe _ZN5cxx201fENS_1AILe3fff8000000000000000EEE`. This occurs because the demangler code assumes that the representation of a floating point number on the target matches the representation on the host. See: https://github.com/llvm/llvm-project/blob/023cdfcc1a5bdef7f12bb6da9328f93b477c38b8/llvm/include/llvm/Demangle/ItaniumDemangle.h#L2558 However, Visual Studio on the windows host implements long double as synonym for double. Therefore, there isn't enough space to unpack into and the implementation overflows the 8 bytes for a long double and triggers the ASAN fault. Without ASAN, the number is decoded incorrectly. Similar problems will affect other cross-compiler demangling scenarios where there is a difference in the floating point representation between the target and host. **Ideas for fixes:** We could simply print the hexadecimal string from the mangled name, this appears to be what GNU implements: GNU cxxfilt demangles `_ZN5cxx201fENS_1AILe3fff8000000000000000EEE` as `cxx20::f(cxx20::A<(long double)[3fff8000000000000000]>)`. If we just printed the mangled hexadecimal string then that would also remove the non-functional differences between the Windows and Linux output with cxxfilt for floating point literals, due to snprintf differences on different platforms. We could use a target/host agnostic floating point decoder e.g. ADT/APFloat - which could make some reasonable assumptions e.g. IEEE 754 representation. We might also provide a way of specifying the target for llvm-cxxfilt.
bd1976bris commented 4 months ago

Note that I filed: https://github.com/llvm/llvm-project/issues/96653 for the difference between Clang's and GCC's mangling for long double literals.