llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.23k stars 11.65k forks source link

multiple typeinfo name #10550

Open llvmbot opened 13 years ago

llvmbot commented 13 years ago
Bugzilla Link 10178
Resolution INVALID
Resolved on Jul 19, 2015 11:09
Version unspecified
OS FreeBSD
Attachments test case, header files for the test case
Reporter LLVM Bugzilla Contributor
CC @d0k,@DimitryAndric,@efriedma-quic,@rjmccall

Extended Description

pes delta-2006.08.03$ clang++ -D_REENTRANT -I./source/common -I./source/i18n -I./source/tools/toolutil -I./source/tools/ctestfw -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/usr/ports/devel/icu/work/icu/source/"' -O2 -pipe -fno-strict-aliasing -W -Wall -ansi -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -c -o uobjtest.o uobjtest.cpp && nm uobjtest.o | grep N6icu_4613DecimalFormatE
                 U _ZTIN6icu_4613DecimalFormatE
                 U _ZTSN6icu_4613DecimalFormatE

pes delta-2006.08.03$ g++ -D_REENTRANT -I./source/common -I./source/i18n -I./source/tools/toolutil -I./source/tools/ctestfw -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/usr/ports/devel/icu/work/icu/source/"' -O2 -pipe -fno-strict-aliasing -W -Wall -ansi -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long -c -o uobjtest.o uobjtest.cpp && nm uobjtest.o | grep N6icu_4613DecimalFormatE
                 U _ZTIN6icu_4613DecimalFormatE

Note that with clang++ the typeinfo name is there twice. This causes problems for typeid as in this example (derived from icu testcase):

typeid(*nf) != typeid(DecimalFormat)

where each is one instance of the _ZTSN6icu_4613DecimalFormatE.

This cannot be reproduced with a flat C++ file. The attached directory structure must be preserved. To reproduce untar the source.tgz and invoke the command line as in above.

d0k commented 12 years ago

NOTOURBUG

d0k commented 12 years ago

Clang is right here, ICU links its shared libraries with -Bsymbolic which essentially breaks symbol uniquing. GCC is unaffected because it doesn't compare the addresses directly but takes an indirection through typeinfo data.

I'd say this is a bug in ICU, but I have honestly no idea how to fix it. Maybe they can just drop -Bsymbolic from their builds.

rjmccall commented 12 years ago

However, in case of the ICU issue, there is just one copy of the vtable (_ZTVN6icu_4813DecimalFormatE) in libicui18n.so.48.1, but there are multiple copies of the RTTI object (_ZTIN6icu_4813DecimalFormatE) and the RTTI name (_ZTSN6icu_4813DecimalFormatE).

This is not consistent with their documentation (that you yourself linked), which says

"For polymorphic classes (classes with virtual functions), the type_info object is written out along with the vtable so that dynamic_cast can determine the dynamic type of a class object at runtime."

It's also not the behavior I'm seeing from GCC.

As far as I can see, multiple objects can have copies of the RTTI objects and/or names, as long as they are COMDATs, and weakly linked.

Yes, this is how COMDATs work.

E.g, the RTTI objects are emitted multiple times, even when there is no key function definition present.

Can you attach preprocessed source for one of these files which ends up exporting the RTTI symbol? I cannot seem to convince GCC to emit RTTI for a class with an external key function.

DimitryAndric commented 12 years ago

Whoa, what? GCC always emits RTTI as weak, even when there's a key function?

Not only as weak, but also as a COMDAT. It seems to be partially explained in: http://gcc.gnu.org/onlinedocs/gcc/Vague-Linkage.html

However, in case of the ICU issue, there is just one copy of the vtable (_ZTVN6icu_4813DecimalFormatE) in libicui18n.so.48.1, but there are multiple copies of the RTTI object (_ZTIN6icu_4813DecimalFormatE) and the RTTI name (_ZTSN6icu_4813DecimalFormatE).

As far as I can see, multiple objects can have copies of the RTTI objects and/or names, as long as they are COMDATs, and weakly linked.

My copy of gcc's typeinfo header has:

[...]
#ifndef __GXX_MERGED_TYPEINFO_NAMES
  #if !__GXX_WEAK__
    // If weak symbols are not supported, typeinfo names are not merged.
    #define __GXX_MERGED_TYPEINFO_NAMES 0
  #else
    // On platforms that support weak symbols, typeinfo names are merged.
    #define __GXX_MERGED_TYPEINFO_NAMES 1
  #endif
#endif
[...]
  class type_info
  {
  public:
[...]
#if !__GXX_MERGED_TYPEINFO_NAMES
    bool before(const type_info& __arg) const;

    // In old abi, or when weak symbols are not supported, there can
    // be multiple instances of a type_info object for one
    // type. Uniqueness must use the _name value, not object address.
    bool operator==(const type_info& __arg) const;
#else
    /** Returns true if @​c *this precedes @​c __arg in the implementation's
     *  collation order.  */
    // In new abi we can rely on type_info's NTBS being unique,
    // and therefore address comparisons are sufficient.
    bool before(const type_info& __arg) const
    { return __name < __arg.__name; }

    bool operator==(const type_info& __arg) const
    { return __name == __arg.__name; }
#endif

E.g. the 'new abi' assumes the name objects will be merged by the linker.

If this is actually the problem, then the code is ill-formed: it violates the ODR by providing a non-inline definition of a function multiple times.

This is not the case in the simple "foo" sample, and as far as I can see, it's also not the case with ICU.

For example, the vtable (_ZTVN6icu_4813DecimalFormatE) is not found in any of the test program's object files, but the RTTI object and name (_ZTIN6icu_4813DecimalFormatE and _ZTSN6icu_4813DecimalFormatE) are:

$ grep -E '_ZTVN6icu_4813DecimalFormatE' *.o
$ grep -E '_ZT[IS]N6icu_4813DecimalFormatE' *.o
Binary file loctest.o matches
Binary file nmfmtrt.o matches
Binary file numfmtst.o matches
Binary file numrgts.o matches
Binary file pptest.o matches
Binary file uobjtest.o matches

(The uobjtest.o file is where the UObjectTest::TestCompilerRTTI() function that fails is actually defined.)

E.g, the RTTI objects are emitted multiple times, even when there is no key function definition present.

rjmccall commented 13 years ago

If this is actually the problem, then the code is ill-formed: it violates the ODR by providing a non-inline definition of a function multiple times.

rjmccall commented 13 years ago

Whoa, what? GCC always emits RTTI as weak, even when there's a key function?

llvmbot commented 13 years ago

Comment 6 rephrased:

pes ~$ cat pr10178.cpp
struct foo {
  virtual int m();
};

int foo::m()
{
  return 42;
}
pes ~$ g++ -c pr10178.cpp && readelf -aW pr10178.o | grep OBJECT
    12: 0000000000000000    24 OBJECT  WEAK   DEFAULT    4 _ZTV3foo
    13: 0000000000000000    16 OBJECT  WEAK   DEFAULT    7 _ZTI3foo
    14: 0000000000000000     5 OBJECT  WEAK   DEFAULT    6 _ZTS3foo
pes ~$ clang++ -c pr10178.cpp && readelf -aW pr10178.o | grep OBJECT
     9: 0000000000000020    16 OBJECT  GLOBAL DEFAULT    4 _ZTI3foo
    10: 0000000000000018     5 OBJECT  GLOBAL DEFAULT    4 _ZTS3foo
    11: 0000000000000000    24 OBJECT  GLOBAL DEFAULT    4 _ZTV3foo

The problem here being the WEAK vs GLOBAL and the fact that clang does not mark those type info symbols with "comdat".

efriedma-quic commented 13 years ago

Nobody has come up with a sane testcase... this sounds like a visibility issue, but it's hard to say for sure.

llvmbot commented 13 years ago

any progress on this?

DimitryAndric commented 13 years ago

It looks like this problem is due to the way clang outputs type_info into the object file. Apparently, in some (or all?) circumstances, clang does not mark the type info (and vtables, btw) as weak and as a comdat, which g++ does.

At runtime, this causes two type_info objects for the DecimalFormat class to exist, when one is in the libicui18n.so file, and the other in the intltest test program (which links to the so file). One of the typeid comparison tests then fails, while it normally succeeds.

So, when compiled by g++, the libicui18n.so file contains only weak symbols for the DecimalFormat type_info and vtable:

$ readelf -aW libicui18n.so.48.1.gcc | grep '_ZT[ISV]N6icu_4813DecimalFormatE'
    77: 00000000003df440   560 OBJECT  WEAK   DEFAULT   19 _ZTVN6icu_4813DecimalFormatE
  1550: 0000000000190b80    25 OBJECT  WEAK   DEFAULT   12 _ZTSN6icu_4813DecimalFormatE
  1646: 00000000003df670    24 OBJECT  WEAK   DEFAULT   19 _ZTIN6icu_4813DecimalFormatE
  3610: 00000000003df440   560 OBJECT  WEAK   DEFAULT   19 _ZTVN6icu_4813DecimalFormatE
  5083: 0000000000190b80    25 OBJECT  WEAK   DEFAULT   12 _ZTSN6icu_4813DecimalFormatE
  5179: 00000000003df670    24 OBJECT  WEAK   DEFAULT   19 _ZTIN6icu_4813DecimalFormatE

while the intltest program also contains weak symbols, and some relocations:

$ readelf -aW intltest.gcc | grep '_ZT[ISV]N6icu_4813DecimalFormatE'
0000000000a345e0  0000035200000005 R_X86_64_COPY          0000000000a345e0 _ZTSN6icu_4813DecimalFormatE + 0
0000000000a34630  0000039200000005 R_X86_64_COPY          0000000000a34630 _ZTIN6icu_4813DecimalFormatE + 0
   850: 0000000000a345e0    25 OBJECT  WEAK   DEFAULT   24 _ZTSN6icu_4813DecimalFormatE
   914: 0000000000a34630    24 OBJECT  WEAK   DEFAULT   24 _ZTIN6icu_4813DecimalFormatE
  3158: 0000000000a345e0    25 OBJECT  WEAK   DEFAULT   24 _ZTSN6icu_4813DecimalFormatE
  3298: 0000000000a34630    24 OBJECT  WEAK   DEFAULT   24 _ZTIN6icu_4813DecimalFormatE

When icu is compiled by clang++, however, the DecimalFormat type_info and vtable are globals:

$ readelf -aW libicui18n.so.48.1.clang | grep '_ZT[ISV]N6icu_4813DecimalFormatE'
    77: 00000000003c6950   560 OBJECT  GLOBAL DEFAULT   19 _ZTVN6icu_4813DecimalFormatE
  1520: 00000000001689f0    25 OBJECT  GLOBAL DEFAULT   12 _ZTSN6icu_4813DecimalFormatE
  1616: 00000000003c6b80    24 OBJECT  GLOBAL DEFAULT   19 _ZTIN6icu_4813DecimalFormatE
  5018: 00000000003c6950   560 OBJECT  GLOBAL DEFAULT   19 _ZTVN6icu_4813DecimalFormatE
  6461: 00000000001689f0    25 OBJECT  GLOBAL DEFAULT   12 _ZTSN6icu_4813DecimalFormatE
  6557: 00000000003c6b80    24 OBJECT  GLOBAL DEFAULT   19 _ZTIN6icu_4813DecimalFormatE

Same for the intltest program, the weak symbols are now globals:

$ readelf -aW intltest.clang | grep '_ZT[ISV]N6icu_4813DecimalFormatE'
0000000000a7c1c0  0000034f00000005 R_X86_64_COPY          0000000000a7c1c0 _ZTSN6icu_4813DecimalFormatE + 0
0000000000a7c220  0000038d00000005 R_X86_64_COPY          0000000000a7c220 _ZTIN6icu_4813DecimalFormatE + 0
   847: 0000000000a7c1c0    25 OBJECT  GLOBAL DEFAULT   24 _ZTSN6icu_4813DecimalFormatE
   909: 0000000000a7c220    24 OBJECT  GLOBAL DEFAULT   24 _ZTIN6icu_4813DecimalFormatE
  5051: 0000000000a7c1c0    25 OBJECT  GLOBAL DEFAULT   24 _ZTSN6icu_4813DecimalFormatE
  5190: 0000000000a7c220    24 OBJECT  GLOBAL DEFAULT   24 _ZTIN6icu_4813DecimalFormatE

Now, for a simpler testcase, consider this short C++ source file:

struct foo {
  virtual int m();
};

int foo::m()
{
  return 42;
}

Compile this with g++, and the type_info name (_ZTS3foo) and type_info object (_ZTI3foo) will become in (x86_64) assembly:

_ZTS3foo:
        .string "3foo"
        .weak   _ZTI3foo
        .section        .rodata._ZTI3foo,"aG",@progbits,_ZTI3foo,comdat
        .align 16
        .type   _ZTI3foo, @&#8203;object
        .size   _ZTI3foo, 16
_ZTI3foo:
        .quad   _ZTVN10__cxxabiv117__class_type_infoE+16
        .quad   _ZTS3foo
        .weak   _ZTV3foo
        .section        .rodata._ZTV3foo,"aG",@progbits,_ZTV3foo,comdat
        .align 16
        .type   _ZTV3foo, @&#8203;object
        .size   _ZTV3foo, 24

Dumping the object file with readelf shows the comdat sections:

...
COMDAT group section [    1] `.group' [_ZTS3foo] contains 1 sections:
   [Index]    Name
   [    7]   .rodata._ZTS3foo

COMDAT group section [    2] `.group' [_ZTI3foo] contains 1 sections:
   [Index]    Name
   [    8]   .rodata._ZTI3foo

and the weak symbols:

Symbol table '.symtab' contains 18 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
    14: 0000000000000000     5 OBJECT  WEAK   DEFAULT    7 _ZTS3foo
    15: 0000000000000000    16 OBJECT  WEAK   DEFAULT    8 _ZTI3foo
    16: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZTVN10__cxxabiv117__class_type_infoE
    17: 0000000000000000    24 OBJECT  WEAK   DEFAULT   10 _ZTV3foo

Contrast this with clang++ (I used trunk r137714), where the assembly for the type_info becomes:

_ZTS3foo:
        .asciz   "3foo"
        .size   _ZTS3foo, 5

        .type   _ZTI3foo,@object        # @&#8203;_ZTI3foo
        .globl  _ZTI3foo
        .align  8
_ZTI3foo:
        .quad   _ZTVN10__cxxabiv117__class_type_infoE+16
        .quad   _ZTS3foo
        .size   _ZTI3foo, 16

E.g, no .weak directives, and no comdat section! Accordingly, the object file has no comdat sections referring to either _ZTS3foo or _ZTI3foo, and the symbols are now just global:

Symbol table .symtab contains 13 entries:

   Num:    Value          Size Type    Bind   Vis      Ndx Name
...
     9: 0000000000000020    16 OBJECT  GLOBAL DEFAULT    4 _ZTI3foo
    10: 0000000000000018     5 OBJECT  GLOBAL DEFAULT    4 _ZTS3foo
    11: 0000000000000000    24 OBJECT  GLOBAL DEFAULT    4 _ZTV3foo
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _ZTVN10__cxxabiv117__class_type_infoE
llvmbot commented 13 years ago

ok, my fault :(

I modified the testing source a little and this is what I am seeing:

pes delta-2006.08.03$ clang++ -D_REENTRANT -I/usr/ports/devel/icu/work/icu/source/common -I/usr/ports/devel/icu/work/icu/source/i18n -I/usr/ports/devel/icu/work/icu/source/tools/toolutil -I/usr/ports/devel/icu/work/icu/source/tools/ctestfw -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/usr/ports/devel/icu/work/icu/source/"' -O2 -pipe -fno-strict-aliasing -W -Wall -ansi -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long uobjtest.cpp -L/usr/ports/devel/icu/work/icu/source/tools/ctestfw/ -L/usr/ports/devel/icu/work/icu/source/lib/ -licui18n && ./a.out
FAIL

pes delta-2006.08.03$ g++ -D_REENTRANT -I/usr/ports/devel/icu/work/icu/source/common -I/usr/ports/devel/icu/work/icu/source/i18n -I/usr/ports/devel/icu/work/icu/source/tools/toolutil -I/usr/ports/devel/icu/work/icu/source/tools/ctestfw -D'U_TOPSRCDIR="../../"' -D'U_TOPBUILDDIR="/usr/ports/devel/icu/work/icu/source/"' -O2 -pipe -fno-strict-aliasing -W -Wall -ansi -pedantic -Wpointer-arith -Wwrite-strings -Wno-long-long uobjtest.cpp -L/usr/ports/devel/icu/work/icu/source/tools/ctestfw/ -L/usr/ports/devel/icu/work/icu/source/lib/ -licui18n && ./a.out
SUCCESS

pes delta-2006.08.03$ cat uobjtest.cpp
#include "nfsubs.h"
void errln(const char *);
 void testIDs() {
     UErrorCode errorCode = U_ZERO_ERROR;
     NumberFormat *nf = NumberFormat::createInstance("de", errorCode);
     if (dynamic_cast<DecimalFormat *>(nf) == NULL) {
     }
     if (typeid(*nf) != typeid(DecimalFormat)) {
         printf("FAIL\n");
     } else 
       printf("SUCCESS\n");
 }

int main() {
  testIDs();
}

It does not reproduce with preprocessed file (ie. prints SUCCESS). use the same source.tgz and you'll have to have "icu" installed.

rjmccall commented 13 years ago

Roman, those are different symbols; the ZTI symbol is the type info, and the ZTS symbol is the type name string, which it seems that GCC isn't exposing here. That is not what's causing the misbehavior here.

efriedma-quic commented 13 years ago

This cannot be reproduced with a flat C++ file. The attached directory structure must be preserved. To reproduce untar the source.tgz and invoke the command line as in above.

Err, you can't attach a .ii (preprocessed) file?

apinski-cavium commented 2 years ago

This is a bug in clang as it is not following the C++ ABI correctly. From https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague-vta : They are emitted in a COMDAT group, with the virtual table mangled name as the identifying symbol.

similarly for rtti: The RTTI std::type_info structures for complete class types and basic types are emitted in COMDAT groups identified by their mangled names.

thesamesam commented 7 months ago

Reopening based on @pinskia's comment. We seem to be hitting this with KDE Plasma, see https://bugs.gentoo.org/923292.

pinskia commented 7 months ago

Note I see the ABI has changed which seems to allow for clang behavior (maybe not user friendly though): Note: previous versions of this ABI required virtual tables to always be emitted with vague linkage, even when emitted in a unique object. This is unnecessary and should not detectable in a valid program; however, it is also harmless, and implementations may opt to continue to do it for compatibility with programs that are not strictly valid.