llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.14k stars 12.02k forks source link

[ItaniumDemangle] Fails to demangle special substitution with template arguments (when parsing a conversion operator) #109130

Open ioannco opened 2 months ago

ioannco commented 2 months ago

Issue description

LLVM demangler fails to decode Itanium mangled name

_ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv which encodes foo::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>() while other demanglers (i.e c++filt) do not fail.

Possible causes

It seems that in llvm/Include/llvm/Demangle/ItaniumDemangle.h:3346 parser has been explicitly told not to try to parse template arguments in the names encoding C-style casts. I did not find any reason why this is illegal in the itanium C++ ABI

Possible patch provided in this pull request

How to reproduce

example.cpp:

#include <iostream>

class foo {
public:
  operator std::string() {return {};};
};

int main() {
  (std::string) foo();
};
$ clang++ example.cpp -o example
$ llvm-nm example | llvm-cxxfilt example

output:

0000000000003dc0 d _DYNAMIC
0000000000004000 d _GLOBAL_OFFSET_TABLE_
00000000000010b0 t _GLOBAL__sub_I_example.cpp
0000000000002000 R _IO_stdin_used
                 w _ITM_deregisterTMCloneTable
                 w _ITM_registerTMCloneTable
00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()@GLIBCXX_3.4.21
                 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4.21
                 U std::ios_base::Init::Init()@GLIBCXX_3.4
                 U std::ios_base::Init::~Init()@GLIBCXX_3.4
0000000000004049 b std::__ioinit
0000000000002138 r __FRAME_END__
0000000000002004 r __GNU_EH_FRAME_HDR
0000000000004048 D __TMC_END__
000000000000037c r __abi_tag
0000000000004048 B __bss_start
                 U __cxa_atexit@GLIBC_2.2.5
                 w __cxa_finalize@GLIBC_2.2.5
0000000000001080 t __cxx_global_var_init
0000000000004038 D __data_start
0000000000001160 t __do_global_dtors_aux
0000000000003db8 d __do_global_dtors_aux_fini_array_entry
0000000000004040 D __dso_handle
0000000000003da8 d __frame_dummy_init_array_entry
                 w __gmon_start__
                 U __libc_start_main@GLIBC_2.34
0000000000004048 D _edata
0000000000004050 B _end
000000000000120c T _fini
0000000000001000 T _init
00000000000010c0 T _start
0000000000004048 b completed.0
0000000000004038 W data_start
00000000000010f0 t deregister_tm_clones
00000000000011a0 t frame_dummy
00000000000011b0 T main
0000000000001120 t register_tm_clones

demangle of line 00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv has failed.

llvmbot commented 2 months ago

@llvm/issue-subscribers-tools-llvm-cxxfilt

Author: Ivan Cheremisenov (ioannco)

# Issue description LLVM demangler fails to decode Itanium mangled name `_ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv` which encodes `foo::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>()` while other demanglers (i.e `c++filt`) do not fail. ## Possible causes It seems that in `llvm/Include/llvm/Demangle/ItaniumDemangle.h:3346` parser has been explicitly told not to try to parse template arguments in the names encoding C-style casts. I did not find any reason why this is illegal in the [itanium C++ ABI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html) Possible patch provided in this [pull request](https://github.com/llvm/llvm-project/pull/109141) ## How to reproduce `example.cpp`: ```cpp #include <iostream> class foo { public: operator std::string() {return {};}; }; int main() { (std::string) foo(); }; ``` ```shell $ clang++ example.cpp -o example $ llvm-nm example | llvm-cxxfilt example ``` output: ``` 0000000000003dc0 d _DYNAMIC 0000000000004000 d _GLOBAL_OFFSET_TABLE_ 00000000000010b0 t _GLOBAL__sub_I_example.cpp 0000000000002000 R _IO_stdin_used w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable 00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()@GLIBCXX_3.4.21 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4.21 U std::ios_base::Init::Init()@GLIBCXX_3.4 U std::ios_base::Init::~Init()@GLIBCXX_3.4 0000000000004049 b std::__ioinit 0000000000002138 r __FRAME_END__ 0000000000002004 r __GNU_EH_FRAME_HDR 0000000000004048 D __TMC_END__ 000000000000037c r __abi_tag 0000000000004048 B __bss_start U __cxa_atexit@GLIBC_2.2.5 w __cxa_finalize@GLIBC_2.2.5 0000000000001080 t __cxx_global_var_init 0000000000004038 D __data_start 0000000000001160 t __do_global_dtors_aux 0000000000003db8 d __do_global_dtors_aux_fini_array_entry 0000000000004040 D __dso_handle 0000000000003da8 d __frame_dummy_init_array_entry w __gmon_start__ U __libc_start_main@GLIBC_2.34 0000000000004048 D _edata 0000000000004050 B _end 000000000000120c T _fini 0000000000001000 T _init 00000000000010c0 T _start 0000000000004048 b completed.0 0000000000004038 W data_start 00000000000010f0 t deregister_tm_clones 00000000000011a0 t frame_dummy 00000000000011b0 T main 0000000000001120 t register_tm_clones ``` demangle of line `00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv` has failed.
llvmbot commented 2 months ago

@llvm/issue-subscribers-clang-frontend

Author: Ivan Cheremisenov (ioannco)

# Issue description LLVM demangler fails to decode Itanium mangled name `_ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv` which encodes `foo::operator std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>()` while other demanglers (i.e `c++filt`) do not fail. ## Possible causes It seems that in `llvm/Include/llvm/Demangle/ItaniumDemangle.h:3346` parser has been explicitly told not to try to parse template arguments in the names encoding C-style casts. I did not find any reason why this is illegal in the [itanium C++ ABI](https://itanium-cxx-abi.github.io/cxx-abi/abi.html) Possible patch provided in this [pull request](https://github.com/llvm/llvm-project/pull/109141) ## How to reproduce `example.cpp`: ```cpp #include <iostream> class foo { public: operator std::string() {return {};}; }; int main() { (std::string) foo(); }; ``` ```shell $ clang++ example.cpp -o example $ llvm-nm example | llvm-cxxfilt example ``` output: ``` 0000000000003dc0 d _DYNAMIC 0000000000004000 d _GLOBAL_OFFSET_TABLE_ 00000000000010b0 t _GLOBAL__sub_I_example.cpp 0000000000002000 R _IO_stdin_used w _ITM_deregisterTMCloneTable w _ITM_registerTMCloneTable 00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string()@GLIBCXX_3.4.21 U std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()@GLIBCXX_3.4.21 U std::ios_base::Init::Init()@GLIBCXX_3.4 U std::ios_base::Init::~Init()@GLIBCXX_3.4 0000000000004049 b std::__ioinit 0000000000002138 r __FRAME_END__ 0000000000002004 r __GNU_EH_FRAME_HDR 0000000000004048 D __TMC_END__ 000000000000037c r __abi_tag 0000000000004048 B __bss_start U __cxa_atexit@GLIBC_2.2.5 w __cxa_finalize@GLIBC_2.2.5 0000000000001080 t __cxx_global_var_init 0000000000004038 D __data_start 0000000000001160 t __do_global_dtors_aux 0000000000003db8 d __do_global_dtors_aux_fini_array_entry 0000000000004040 D __dso_handle 0000000000003da8 d __frame_dummy_init_array_entry w __gmon_start__ U __libc_start_main@GLIBC_2.34 0000000000004048 D _edata 0000000000004050 B _end 000000000000120c T _fini 0000000000001000 T _init 00000000000010c0 T _start 0000000000004048 b completed.0 0000000000004038 W data_start 00000000000010f0 t deregister_tm_clones 00000000000011a0 t frame_dummy 00000000000011b0 T main 0000000000001120 t register_tm_clones ``` demangle of line `00000000000011e0 W _ZN3foocvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv` has failed.
Michael137 commented 2 months ago

This fails to demangle when we parse the Sa special substitution in parseType and encounter the template argument IcE while TryParseTemplateArguments == false, as you point out. I suspect this was an untested codepath because with libc++ all types live in an inline namespace (most commonly std::__1), so Clang is never allowed to produce those special substitutions. But with libstdc++ apparently only std::basic_string lives in the __cxx11 namespace, but not std::allocator. So hence we compressed std::allocator but not std::__cxx11::basic_string. See godbolt.

Slightly more self-contained reproducer would be:

namespace std {
template <typename T>
struct allocator {};
}

template<typename T>
struct Bar {};

struct Foo {
    operator Bar<std::allocator<char>>() { return {}; }
};

int main() {
    Foo f;
    (Bar<std::allocator<char>>)f;
    return 0;
}

AFAICT, the reason why TryToParseTemplateArgs exists, is to support cases where we have conversion operators whose result-type contains forward template references. So if you allow parsing template arguments like you propose, I suspect there are cases where we would get confused about which template the forward reference referred to. E.g., (taken from one of the test failures) for _ZN5OuterI4MarpEcvT_I4MerpEEv, we would now demangle to:

Outer<Marp>::operator Marp<Merp>()

instead of:

Outer<Marp>::operator Merp<Merp>()

Nonetheless, this is a bug with the demangler.