nneonneo / Il2CppVersions

Build scripts & historical header files for every available minor version of Unity's Il2Cpp project
119 stars 31 forks source link

Set of improvements #10

Closed djkaty closed 4 years ago

djkaty commented 4 years ago
  1. Added the hidden .vscode folder to .gitignore when using VS Code as the editor
  2. Include Unity 2018.x, 2019.x and 2020.1.x latest versions
  3. Include object-internals.h (see below)
  4. Add a script which copies and renames the header files required for Il2CppInspector's repository automatically
  5. Update the current set of headsers and diffs

The key change is item 3. We remove the fixed definitions for Il2CppObject etc. and replace them with C-compatible versions from either il2cpp-object-internals.h or object-internals.h as necessary. A slew of regex processing is done to try to ensure C compatibility. I have tested the output against 9 production applications from 2016-2020 in IDA, Ghidra and compiling with C++ scaffolding in Visual Studio, so although it's probably not perfect, it seems like a reasonable first attempt.

The reason for this change is illustrated in https://github.com/djkaty/Il2CppInspector/issues/79 where the user needs access to Il2CppException. Rather than adding these classes one by one piecemeal, we just import the entire header file for the wanted version of Unity.

djkaty commented 4 years ago

Just circling back on this, so I think the only outstanding item is the struct layout issue since the offset tests worked?

I'm not very fluent at all in all the nuances of how compilers organize things or how to use directives to pack stuff correctly, could you help me solve this issue? I assume we need to use some #ifdefs and produce two different definitions, one for GCC and one for MSVC?

nneonneo commented 4 years ago

Whoops I thought I sent the comment, but it didn't show up. Sorry about that.

The way I see it, we have a few options:

  1. Do nothing. It will not be possible to sizeof any affected structures or use their fields on GCC. This may be a fine short-term solution but will probably require a TODO on top. We could, for instance, issue a pertinent #warning for GCC.
  2. Check which structs are actually affected. I would expect the primary source of problems to be structs containing both ints and pointers in 64-bit mode on GCC, since those will be subject to alignment differences. Actually fixing the problem will require walking fields and inserting padding or dummy fields, or constructing new artificial structures (which we already do for Il2CppClass).
  3. Use C++ instead of C, splitting the headers so that we generate a C header for class-internals (where everything is C-compatible already) and a C++ header for object-internals. IDA can only make use of class-internals, while projects aiming to call stuff from an Il2Cpp library should be freely implemented.

My preference is for 3: it means we can drop a lot of the hacks around "de-C++"'ing the object-internals header, and it makes the structure alignment work. It'll probably also be a nicer long-term solution.


Addendum: I did some light testing with various packing options but didn't find anything satisfactory yet. Here's my test program; compile with gcc test.cpp -o test -Wno-invalid-offsetof and run ./test:

#include <stddef.h>
#include <stdio.h>

struct A { int a; char b; };
struct B : A { char c; int d; char e; };
struct C : B { char f; };

struct Ac { int a; char b; };
struct Bc { struct Ac base; char c; int d; char e; };
struct Cc { struct Bc base; char f; };

struct Ad { int a; char b; };
struct Bd { struct Ad base; char c; int d; char e; };
struct Cd { struct Bd base __attribute__((packed)); char f; };

int main() {
    printf("%ld %ld %ld\n", sizeof(struct C), sizeof(struct Cc), sizeof(struct Cd));
    printf("%ld %ld %ld\n", offsetof(struct C, a), offsetof(struct Cc, base.base.a), offsetof(struct Cd, base.base.a));
    printf("%ld %ld %ld\n", offsetof(struct C, b), offsetof(struct Cc, base.base.b), offsetof(struct Cd, base.base.b));
    printf("%ld %ld %ld\n", offsetof(struct C, c), offsetof(struct Cc, base.c), offsetof(struct Cd, base.c));
    printf("%ld %ld %ld\n", offsetof(struct C, d), offsetof(struct Cc, base.d), offsetof(struct Cd, base.d));
    printf("%ld %ld %ld\n", offsetof(struct C, e), offsetof(struct Cc, base.e), offsetof(struct Cd, base.e));
    printf("%ld %ld %ld\n", offsetof(struct C, f), offsetof(struct Cc, f), offsetof(struct Cd, f));
}

What is needed is to have a structure with no padding, but where the structure elements are themselves still aligned. GCC doesn't seem to have a way to specify this (notably, a two-element array of such a structure would have a misaligned second member).

djkaty commented 4 years ago

Okay, good info there. The problem with option 3 is that people doing analysis in IDA and Ghidra could benefit from having all of those extra structs available (when they assign type pointers to registers/parameters etc.)

Currently calling into Il2Cpp applications is only supported for PE binaries in Il2CppInspector so MSVC is likely the only target right now. I do plan to expand this to Android at some undetermined point in the future.

Re-considering option 1, only the following classes use inheritance:

// 5.3.3+
Il2CppArray
Il2CppArraySize

// 5.3.4 only
Il2CppRCW

// 5.3.5+
Il2CppComObject
Il2CppISequentialStream*
Il2CppIStream*
Il2CppIMarshal*
Il2CppIManagedObject*

// 5.3.6+
Il2CppIInspectable*
Il2CppIActivationFactory*

// 5.5.0+
Il2CppIManagedObjectHolder*

// 5.6.0+
Il2CppException
Il2CppIRestrictedErrorInfo*
Il2CppILanguageExceptionErrorInfo*
Il2CppIAgileObject*

In 2018.1.1, all of the above were changed to not use C++ inheritance anymore.

The starred (*) items contain only the object they derive from and a single static const Il2CppGuid IID. In 2018.1.1 they were removed from the headers that we use altogether.

This leaves only 5 items - Il2CppArray, Il2CppArraySize, Il2CppComObject, Il2CppException and Il2CppRCW. All but Il2CppArraySize derive from Il2CppObject, and Il2CppArraySize derives from Il2CppArray. When inheritance was removed, the developers substituted in the inherited class as the first field. In the case of Il2CppArraySize, they substituted all of the fields from Il2CppArray.

I'm not entirely sure but does this not lead to a situation where the resultant layouts are the same for both MSVC and GCC? If not, perhaps we could just re-write the five problem classes?

nneonneo commented 4 years ago

OK, this is good information to have. With a single level of inheritance struct A; struct B : A;, GCC and MSVC both behave as if you did struct B { struct A base; ...}. So, in such a case, it's safe to perform this transformation.

For the single second-level case, I think the extra fields of Il2CppArraySize are actually aligned (IIRC) even in 64-bit builds, so I think it's OK as well. And, since future versions will presumably be C-compatible, I guess we don't have to worry about future-proofing it.

This would be good to write up in a comment or something near the code which performs the transformation, then I can go ahead and merge it. Thanks again for your work!

djkaty commented 4 years ago

Comment added!

In a show of poor PR etiquette I also added the latest Unity versions while I was at it. Thanks for explaining stuff to me!