rui314 / mold

Mold: A Modern Linker 🦠
MIT License
14.13k stars 464 forks source link

A different resolution than ld.bfd #827

Open marxin opened 1 year ago

marxin commented 1 year ago

Reduced from binutils test-suite:

a.c:

#include <stdio.h>

extern void __attribute__ ((weak)) foo (void);

char x, y, z;

long
lowest_align (void *a, void *b, void *c)
{
  unsigned long bits = (long) a | (long) b | (long) c;
  return bits & -bits;
}

int
main (void)
{
  printf ("library %sloaded\n", &foo ? "" : "not ");
  printf ("alignment %ld\n", lowest_align (&x, &y, &z));
  return 0;
}

b.c:

long long x, y, z;

void foo (void) {}
$ gcc b.c -shared -o x.so -fPIC -fcommon
$ gcc a.c -c -O2 -fcommon -o a.o
$ gcc a.o -Wl,--as-needed x.so && ./a.out
library loaded
alignment 8
$ gcc-12 a.o -Wl,--as-needed x.so && ./a.out
$ ./a.out
library not loaded
alignment 1

The later on is BFD.

ishitatsuyuki commented 1 year ago

Hm, looks like our symbol resolution is going wild. Checking.

ishitatsuyuki commented 1 year ago

The use of common symbols seems to be the culprit: Right now in mold, COMMON symbols are resolved as if they are undefined. Hence it triggers the needed heuristic.

I'm not entirely sure this is something worth tackling since the behavior of COMMON is rather underspecified and this is unlikely to cause problem in practice; although I do agree making an SO extraction doesn't make much sense here.

There's also another unrelated problem where copyrel symbols would have proper values in .dynsym but not in .symtab due to the timing their contents are calculated. This is what confused me in the beginning, but can be fixed separately.

ishitatsuyuki commented 1 year ago

It looks like the origin here is:

When a symbol is COMMON and ld sees an archive, ld checks whether the archive index provides a STB_GLOBAL definition of the symbol. If yes, ld extracts the archive as well. This is in contrary to the usual rule that only an undefined symbol leads to archive member extraction.

https://maskray.me/blog/2022-02-06-all-about-common-symbols#linker-behavior

And mold by design treats lazy archives and as-needed SOs the same, which explains the behavior you're seeing. So the rules around here is really a combination of legacy matters, and since modern toolchains no longer emit common unless asked to, the only reason to care about this is when some legacy application is relying on it. Otherwise, I'm inclined to say that this is just an implementation-defined behavior.