riscvarchive / riscv-binutils-gdb

RISC-V backports for binutils-gdb. Development is done upstream at the FSF.
GNU General Public License v2.0
148 stars 233 forks source link

relocation truncated to fit: R_RISCV_RVC_LUI #144

Closed ldoolitt closed 6 years ago

ldoolitt commented 6 years ago

This error was observed in real life, a typical final error signature looks like timer.c:(.text+0xa): relocation truncated to fit: R_RISCV_RVC_LUI against symbol 'timerStartValue' defined in .sbss section in ../../firmware/src/timer.o /opt/riscv32imc/lib/gcc/riscv32-unknown-elf/7.2.0/../../../../riscv32-unknown-elf/bin/ld: final link failed: Symbol needs debug section which does not exist collect2: error: ld returned 1 exit status Previously observed in a gcc issue, where lz-sc provided a patch. I confirmed the problem still exists as of commit 1b80cbe of riscv-gnu-toolchain-rv32i, and that a re-based version of lz-sc's patch still fixes it. My gihub-fu failed me, so it's not attached here; you can get it from my server. We've started reducing the test case, but it's still kind of a mess and I'm not comfortable posting it here yet, sorry.

jim-wilson commented 6 years ago

lz-sc's patch is flawed. Because we are using lui/addi to load 32-bit constants, and addi is signed, the value used for lui has to be rounded to the nearest multiple of 0x1000. So for instance to load 0x1ffff, the lui loads 0x2000 and the addi subtracts 1. This is why the current code uses RISCV_CONST_HIGH_PART to adjust the constant value before checking if it is valid. lz-sc's patch does not do this correction, and hence will reject some valid values for c.lui.

However, there does appear to be an unfixed problem here, as the bug reported in https://github.com/riscv/riscv-gcc/issues/97 is for a c.lui with a 0 immediate, but the fix linked to it is for handling an x0 destination. So it appears the original bug may not have been fixed. But since lz-sc's patch is flawed, that does not look like the correct fix.

Unfortunately, it isn't clear what the actual bug is here. I'd need a testcase that I can use to reproduce it. The gcc issue doesn't have a complete testcase, and you haven't provided one yet either. There could be something funny happening during relaxation that the current code does not account for.

You might try using the FSF binutils sources. The riscv-binutils-gdb sources are out-of-date, and a number of problems have already been fixed in the upstream FSF binutils tree. Also, the FSF binutils tree will make this problem a little easier to debug, as you can add -Wl,--noinhibit-exec to the gcc link command, and then use objdump on the resulting executable to see what the symbol values are and what the final instruction is.

ldoolitt commented 6 years ago

I got a test case reduced to the point where it doesn't have too much distracting cruft. Attached.

linker_error_test4.tar.gz

jim-wilson commented 6 years ago

I see the problem. Your linker script has a single memory section, so text+data ends up in a single segment in the executable. Normally, text and data are separate segments, and when we do relaxation on text, that does not affect addresses in the data section. But with text and data in the same segment, relaxation causes addresses in the data section to decrease and that breaks assumptions in the riscv port. In this particular case, the original value for the symbol is 0x800 which gives a valid c.lui of 0x1. Then relaxation decreases text size, and the final value for the symbol is 0x694 which gives an invalid c.lui of 0.0.

Unfortunately, the code size decrease from relaxation is potentially unbounded, so there is no easy way to calculate any useful value here. The only good solutions are to require that text and data are in separate segments, or to disable relaxation.

The code already disables relaxation for text symbols so that this problem can not occur for them. At the top of _bfd_riscv_relax_lui is / Mergeable symbols and code might later move out of range. / if (sym_sec->flags & (SEC_MERGE | SEC_CODE)) return TRUE; If we wanted to do the same for data symbols, then this would effectively be if (1) return TRUE; and we wouldn't have any lui relaxation anymore which would be undesirable.

jim-wilson commented 6 years ago

FYI you can turn off linker relaxation from gcc with -Wl,--no-relax.

ldoolitt commented 6 years ago

OK, that makes sense. Thanks for looking into it. We clearly need to think through our section ordering more carefully.

Here's an archival copy of the (useless?) patch I posted above, in case my server ever goes dark: elfnn-riscv-7615ca1.txt