sgidevnet / sgug-rse

Silicon Graphics User Group RPM Software Environment
GNU General Public License v3.0
62 stars 20 forks source link

binutils: multi-GOT building/linking issues #40

Open danielhams opened 4 years ago

danielhams commented 4 years ago

A place to add notes and reproduce cases for our toolchain bug related to multi-GOT.

This python script (https://esp.iki.fi/generate3.py) was mentioned as generating things that exhibit the behaviour, I've personally not had the time to run this yet + compare against MIPSpro / older toolchains.

@onre if there's anything you feel like adding, please do!

onre commented 4 years ago

Some background:

binutils supports many different formats and architectures. Some of these require separate tables for addresses and procedures, aka GOT and PLT. On IRIX GOT and PLT are combined into one section. This section is normally accessed using indexed memory operations, limiting its size to 16k entries.

An absolutely certain set of conditions that will create a library that can't be successfully relocated: make sure it has more than 16k GOT and PLT entries combined, and make sure it has both types. Then, use nm to find out where in the memory the library would load if not relocated. Create a test executable using said library, use linker flags to force it to overlap the library's "natural" space. This should fail to be relocated correctly - this is evident right when loading the library when the library init code can't call __do_global_ctors_aux() because of incorrect $gp value, probably pointing to the first 16k part of the GOT.

Learned from reading binutils code; the dynamic symbol table and the GOT are in sync - the first relocatable symbol in dynsym is the first GOT entry, and from then on they have identical entries in identical order.

Noted in practice, may or may not be related; binutils-generated multi-GOT may have a lot of trailing "garbage" not recognized by IRIX elfdump -Dg, it will look like this:

        [     16377]: 0x00000000 32756(gp), 0x61ff5c14 [sqlite3_bind_null]
        [     16378]: 0x61955f38 32760(gp), 0x61ff5c18 [JS_DHashTableSizeOfIncludingThis]
        [     16379]: 0x00000000 32764(gp), 0x61ff5c1c [<section 6 st_name 0x5f474c invalid>]
        [     16380]: 0x80000000 32768(gp), 0x61ff5c20 [<section 6 st_name 0x5441424c invalid>]

In this case it looks like the second GOT part also has the two fixed first-part entries (addresses 0x00000000 and 0x80000000). This does not happen always, but sometimes.

Pondering:

How IRIX does this:

An example of this is easy to produce, just run the python thing and compile the result with MIPSpro toolchain.

onre commented 4 years ago

So, two possible avenues:

  1. The mega-GOT. With the current binutils, you can create an addresses-only mega-GOT which IRIX elfdump can read past the first 16 k entries and which seems to work, suggesting that if we manage to make binutils create a mixed GOT-PLT mega-GOT, it might work as well.

  2. Do what MIPSpro toolchain does. Do not create a mega-GOT but instead some smaller GOTs and the corresponding .dynamic sections for them.

Both might mean fixing some internal bfd bugs too. Some part of the GOT combining-reassigning logic might be doing something inappropriate here.

danielhams commented 4 years ago

Of the options you list - my initial reaction is to "do what MIPSpro toolchain does" - as we might avoid other issues that strange (to IRIX) extended GOTs could cause.

RE: the python script, some thoughts

Do we know at what point the behaviour starts to diverge (using this script)?

e.g. At N symbols, everything ok At N+1 symbols, binutils does X and this causes Y At N+2 symbols, additional breaking caused (or problem remains the same)

Certainly isolating that "N/N+1" case will allow us (with some instrumentation such as liberal logging) to:

  1. Back out any existing unwanted behaviour in binutils
  2. Start from a clean slate on implementing the needed behaviour to "work like MIPSPro/native linker"

I think my first step would to identify that N, and have a script that generates the N and N+1, N+2 scenarios and uses both toolchains on them. Like this we are part of the way to having a "test suite" for this kind of change, too.

Having N, N+1 scenarios means an X linker built/running on linux can be directly stepped through in gdb or other visual debugger, too (I like to use Eclipse for this, yeah, I know).

danielhams commented 4 years ago

And another unhelpful comment, probably:

During my travels to get a more recent binutils version working on IRIX, I noticed that apart from the linker, all the other tools in binutils 2.32 were happily passing their tests.

I wonder if there is value that we focus this change + tweak on a "fixed" binutils 2.32 (but honestly, I've no idea how tricky it would be to properly resolve the non-multiGOT issues in that version).

onre commented 4 years ago

The hard N is "when there is a PLT, and GOT + PLT combined size reaches 16382 entries", possibly plus or minus a couple. The breakage is uniform from there on. The aforementioned weird case is with Firefox libxul.so and as said, may or may not be related. The stuff generated by the Python script is more predictable and given that we know all the function and variable names, it's easy to verify whether or not they ended up in the dynsym table.

onre commented 4 years ago

https://github.com/bminor/binutils-gdb/blob/binutils-2_23-branch/bfd/elfxx-mips.c#L8928

When we hit that condition, we're in the "may or may not work" territory. I am not sure where the GOT + PLT joining happens.

onre commented 4 years ago

http://mirror.rqsall.com/misc/multi-got-mips.txt

Here's the GOT article from dmz-portal.mips.com. Thanks to @larb0b for getting this!

onre commented 4 years ago

I'd like someone to confirm me being wrong with this one; what if this is only limited to that one particular case - the stuff in crtend.o not being accessible from crtbegin.o if crtend.o bits end up beyond the 16382 entry barrier? This would nicely cause things to break in the way described, and turn all the above speculation into an army of red herrings?

onre commented 4 years ago

So, right. Better to document this all while I'm at it, memory can't be trusted here.

I used https://esp.iki.fi/generate2.py to generate a library with 20000 functions and variables, and an executable which calls all those functions and verifies the return values to correspond to the function names - a simple mechanism to ensure that correct functions are being called.

I checked the library with readelf -S to find out its "native address base" to be 0x5ffe0000, and to force a relocation I built the test executable with -Tdata 0x5ffe0000. Running readelf -S on the test executable confirms that it did what it claims to do:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
...
  [17] .rodata           PROGBITS        5ffe0000 280000 075320 00   A  0   0 16

I checked the GOT of the executable with IRIX elfdump -Dg and sure enough, there was this:

        [     20014]: 0x00000000 47304(gp), 0x60068c78 [func003593]
        [     20015]: 0x00000000 47308(gp), 0x60068c7c [func018413]
        [     20016]: 0x00000000 47312(gp), 0x60068c80 [<section 7 st_name 0x5f5f72 invalid>]
        [     20017]: 0x80000000 47316(gp), 0x60068c84 [<section 7 st_name 0x5f656e76 invalid>]

Then I ran it with a couple of vars set to see what rld is doing. Log: https://gist.github.com/onre/3a3c4df7675b8ef13649ef2abef7d198

Summary: library got relocated, everything worked. Looks like elfdump output thing is a red herring. However, the GOT only contained PLT entries this time - need to alter the test file generator a bit to get a mixed GOT-PLT thing going on and see whether that works as well. At that point we can change the generator to output C++ instead of C and see whether the static initialization sequence is to blame, amirite?

UPDATE; this worked because ld was used instead of gcc and thus the resulting library does not run the crtbegin & crtend stuff - that's why it worked. However, the GOT/PLT itself works just fine after the library is loaded, so, does this prove the bug is only in init code?

onre commented 4 years ago

https://gist.github.com/onre/c728cf2b22c206195bfa8afb4d93a71e

Here's a sample of what the generate2.py script generates, here with only one function. Basically:

  1. create n functions with name funcn, which return a value we can easily check.
  2. make a library of those functions.
  3. create functions which call those funcn functions and makes sure they return correct values.
  4. create a main() which calls the functions created in step 3.
  5. make an executable consisting of stuff generated in steps 3 and 4.
onre commented 4 years ago

Alright, I'm not going to work on this anymore. Here's a tarball and a shell script which might be useful for the next brave soul. Works with gcc 4.7.4, does not work with gcc 9.2.0. Instructions:

  1. unpack
  2. cd testcase
  3. ./buildme.sh
  4. use readelf -S to make sure a.out and libptest.so overlap, if not, adjust the address at last line of buildme.sh and re-run it. Possibly use the rld.debug with -v to confirm the library gets moved.
  5. ./a.out

https://esp.iki.fi/irix-megagot-testcase.tgz