plast-lab / cclyzer

A tool for analyzing LLVM bitcode using Datalog.
MIT License
96 stars 14 forks source link

Understanding `template_type` #10

Closed karljs closed 5 years ago

karljs commented 5 years ago

I'm trying to debug some unexpected behavior (actually using the Souffle port, but the question applies to both) and am confused about what the template_type rule is doing here:

https://github.com/plast-lab/cclyzer/blob/59cee09e2c9a6ebd8702af2cb821bef3c54db770/src/logic/points-to/class-type.logic#L187

It doesn't seem to be referring to templates in the C++ sense I'm thinking, as LLVM does not append .base to the declarations for template classes. Instead, that suffix seems to be related to padding as it pertains to inheritance.

I'm missing some class hierarchy data because primary_superclass relies on _typeinfo_class_type, and that rule says that both template_type and template_typeinfo must hold (or both not hold). However, those two rules seem to be expressing quite different, unrelated things. One is looking for < > symbols, indicating a template, while the other is looking at padding information.

I'm probably just misunderstanding. Could anyone please provide some intuition for what's going on with this portion of the logic code?

gbalats commented 5 years ago

Iirc, it does refer to C++ template instantiations. The various instantiations of a template class Foo, when compiled to LLVM bitcode, would produce classes with names like Foo.1, Foo.2, etc. So that was basically, a hack to tell from the compiler-generated name which classes were template instantiations. However, the hack is far from perfect. Same suffixes can arise in LLVM with more ways than just template instantiations (like same type defined in multiple compilation units).

The <> part only appears in dwarf debug info; not in the name of the LLVM type. The .base indeed has nothing to do with C++ templates (and is about inheritance, padding, etc, as you say), hence the check DotSuffix != "base".

As for _typeinfo_class_type it is supposed to hold a mapping from a class to its typeinfo object. It seems it has 2 rules, one that works for template types and one for non-template ones. These rules are complicated in that they're trying to reverse engineer the Itanium ABI rules. I think the easier way to understand them is to look at the results side-by-side with disassembled bitcode. Also: https://itanium-cxx-abi.github.io/cxx-abi/abi.html

karljs commented 5 years ago

Thank you for the explanation, it's very helpful and will get me back on track debugging my particular set of results.