Closed milend closed 4 years ago
So this is to make it possible to link on CI machines and debug on local machines? Are relative N_OSO
paths already handled by lldb?
@rmaz:
So this is to make it possible to link on CI machines and debug on local machines?
Correct.
Are relative N_OSO paths already handled by lldb?
Yup. We already support that in Buck, it's just a lot more expensive due to having to post-process. There's a setting cxx.cache_links which controls whether the post-processing actually happens (or rather, if link outputs are cached, post-processing must happen to make the artifacts cacheable).
Do you propose a similar solution to clang, ie -debug_prefix_map /some/path=.
?
@rmaz:
Do you propose a similar solution to clang...
Yup, that's exactly what the "Proposed Solution" section outlines (though it wasn't very clear; I amended it).
What about the existing -oso_prefix flag?
@michaeleisel:
What about the existing -oso_prefix flag?
I actually wasn't aware of that flag until now. I did some quick digging and cannot find any references in the source code (ld64 or zld).
-oso_prefix
multiple times, like with -fdebug-prefix-map
?It's contained in Options.cpp and mentioned in man ld
(although I hadn't seen this myself either until today). I haven't tried it, I just saw it
I used GitHub's search function but that didn't find it. Looking through the code, I can see it now! I think it will be sufficient for what we need: let me do some proper tests and I'll report back.
It seems it was added in the latest version of ld64 (ld64-512.4) which I presume was the Xcode 11 toolchain? The option was not present in 450.3.
Makes sense. They've made a number of changes in the more recent versions for better determinism
They've made a number of changes in the more recent versions for better determinism
Oh, interesting. Do you have a list of specific changes re: deterministic builds?
I don't, but some grepping for stuff like "repro" and "determ" will help. You can also diff 409-450 and 450-512
I've played around with the new option and it works as intended.
The Problem
When linking object files compiled with debug support (
-g
), the symbol table (LC_SYMTAB
) will containN_OSO
entries that point to the object files which contain the debug information.The
N_OSO
entries will use absolute paths which means that the final linker output (e.g., binaries, dylibs, etc) will be dependent on the absolute repository path.Example
Assume two different checkout paths:
/Users/user/a/repo
/Users/ci/b/repo
Assuming that there's an object file at a relative path to the repo at
build/out/X.o
, then any debug output binaries will have different paths depending on the machine which they were built on. For example:/Users/user/a/repo/build/out/X.o
/Users/ci/b/repo/build/out/X.o
Desired Output
To make the output independent of the repository checkout path,
N_OSO
entries should be relative to the repo root (or another desired root path).The ramification for debugging is that
lldb
needs to have the correct working directory to be able to find symbols as it cannot rely on absolute paths. This is usually done with an appropriatelldbinit
script.Proposed Solution
This is a common issue faced by similar tools (e.g., compilers). Both Clang and GCC support the
-fdebug-prefix-map
option which allows the remapping of paths.We can adopt the same approach here to make builds independent of checkout path.
Alternatives
There are some alternative approaches to the problem with different tradeoffs in complexity, cost and maintenance.
Post Processing
The symbol and string tables can be rewritten after a binary/dylib is linked. This is the approach taken by Buck which post-processes the output from the linker if it needs to be deterministic.
The major downside is performance. For binaries of significant size (e.g., 500MB+), the symbol + string tables can account for ~50% of the size. Processing and rewriting the binaries becomes an expensive operation, as just a small number of
N_OSO
entries need to be adjusted.In-Place Patching
While the string table can be patched in place, the result will only be deterministic if the repository paths are of equal length. Since that's not practical to ensure at scale, it's not a viable optimisation.
Mount Points
Another potential option would be to mount the repository in a predictable location (e.g., using bindfs). Unfortunately, that's not practical at large scale across multiple local and remote development configurations.