Open kubkon opened 3 years ago
I'm interested in giving this a look-in! Where would be a good place to start in understanding the problem?
If I were tackling this problem, I would most likely create a fresh repo with sole purpose of building a drop-in replacement for llvm-ar
, much like zld is for lld
. You can then build it out-of-tree which means you don't have to worry about passing Zig tests at this stage and you significantly cut down on build times. I would also focus on just one file format in the beginning say ELF, or Mach-O. The idea here is that if you were to pick any large C/C++ codebase (or whatnot) in the wild, you could pass the Zig archiver as a replacement for the default system one or llvm-ar
, that is you'd tweak the CMake/Make invocation like so:
AR=zig-ar CC=... CXX=... cmake ../
or the same but with make. If the build process succeeds, then success!
Afterwards, you might wanna consider dropping the archiver as a direct replacement for llvm-ar
in the Zig upstream either by calling directly to a precompiled binary or putting the sources in-tree (the latter is the end-goal actually). The relevant source where this should/could happen is in src/link.zig#L668:
pub fn linkAsArchive(base: *File, comp: *Compilation) !void {
//...
const llvm = @import("codegen/llvm/bindings.zig");
const os_type = @import("target.zig").osToLLVM(base.options.target.os.tag);
const bad = llvm.WriteArchive(full_out_path_z, object_files.items.ptr, object_files.items.len, os_type);
if (bad) return error.UnableToWriteArchive;
//...
}
Will give that a crack!
JFYI, https://github.com/TinyCC/tinycc/blob/mob/tcctools.c has an extremely hacky impl for Elf support on Windows.
Yeah - I've made a little bit of progress with this. Just still getting comfortable with zig so it's going to be a slightly idiosyncratic start. But it feels like a doable project.
https://github.com/moosichu/zar/
I just have a little program that can parse a very simple archive file and then prints out all the "filenames" of the files it contains.
I'm just building things up slowly step-by-step, with a focus on reading archives generated by llvm-ar to begin with.
The goal will be to make it a drop-in replacement, but will figure-out the order in which I do things as I go for now (still going to be very experimental early on).
I think what will probably end up happening is that I will experiment with parsing increasingly interesting archive files. And then I will loop back around and implement the command-line interface for the program. And then just incrementally work on each piece of functionality testing against the results of llvm-ar.
Will then probably make some kind of framework for testing those as well I think.
You might want to look at https://github.com/SuperAuguste/zarc . It can parse ar
s, but it cannot create them. So maybe just using that and then adding features to create tars would be good?
Yeah, the ultimate goal of zar
should be generating static archives. Adding parsing logic is a good first step to figuring out how it works though. One thing to pay particular attention to is the differences in generated ar structure between linux and macos - I believe there is a difference in at least the header format but maybe more. Also, I've been reached out to by multiple people expressing interest in helping out so @moosichu are you fine taking charge on this one and potentially collaborating with others? If so, I'll send them your way (to your fresh repo, etc.).
Yes - very happy to collaborate and I’ve started reading up and putting sources together on the differences in the formats for different platforms! I’ve linked to some of them in the repo - but will flesh that out properly tomorrow as well for others hoping to contribute as well.
I have also been doing my own independent attempt at this issue, here https://github.com/iddev5/zig-ar
My ar can create basic files so far, and it is compatible with llvm-ar and ranlib too.
If it works out, I can try merge it with zar as discussed above...
Great progress! Since you have two repos it might make sense to split the focus a little. For example, @iddev5 could focus on linux and @moosichu on macos, etc., and afterwards merge both together as zar or otherwise. How's that for a plan?
This sounds great! Just to let you know that I am okay with either plans. On my repo, I have already got reading and writing common-style archives done (without symbol table and string table, ofc)
Hey I've been in contact @moosichu and @kubkon and wanted to join in. I'm on Linux, but happy to help out where I can!
Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.
https://github.com/moosichu/zar/
But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.
Sounds good - I think my repo needs to be fleshed a bit more before people can start making meaningful contributions (without stepping on each other's toes). I have made some good progress on argument processing though - and will keep chipping away at the problem today.
https://github.com/moosichu/zar/
But with the amount of interest expressed - I will try and fast-track to that point ASAP (hopefully this evening) including having issues that can be tackled by individuals (which I will be open for accepting PRs for). Especially as I will be working during the week so won't have anywhere near as much time to work on this then.
Thanks for taking charge at organising this @moosichu, it's very much appreciated! If you need any assistance from me, please do let me know!
The work of @iddev5 has been merged into the https://github.com/moosichu/zar/ repo. Thank you! @iddev5!
I need to properly read through the changes (and some cleaning-up needs to be done to make each of works consistent with each other). But it's a good step of progress for sure.
In terms of what I have done - I have the "print" and "display contents" ("p" and "t") operations working for both BSD & GNU-style files (although without support for symbol tables at this point).
Having looked at the problem - due to the slightly sutble ways the parsing of the different kinds of archives can overlap in functionality, it seems slightly better to structure the code around that & then slowly expand the functionality of which operations can be done on those files vs. completing everything for one kind of file and then adding another.
There's a couple of issues on the repo - mainly around cleaning up the merge & working on testing (something I haven't looked into at all). I've jotted my thoughts on how the latter could work if anyone is interested in that.
Progress has been fairly good so far overall I think! Lots still to do - and my time is going to be a bit more limited for the coming couple of weeks. But I will make sure to at least check the status of things every morning even if I can't work directly on the problem.
I did consider opening up the repo to others with commit access - but I think I might hold off on that as we can each probably work more quickly in our own repos (problems should be orthogonal enough), and I think it might be better if the project stabilises a bit first and things are a bit more coherent before then so that we are all on the same page before that happens. So I think we can see how things go with a PR-based model for now I think? If that doesn't work well I'm very open to reconsidering though.
My next focus for tomorrow morning (unless @iddev5 gets there first!) will be to look through the code that has been merged-in and to unify it into the rest of the code base a bit more concretely. But I won't be able to get on that until then, so if stuff is done on that in the meantime I will make sure to take that into consideration. Hopefully my comments (both in TODOs in the code and my write-up on the issue there help).
Otherwise @iddev5 feel free to just focus on expanding the functionality of what you already have (and if you create any PRs I will happily merge them). I can then sort out the unification side of things in the short term until things have settled.
It seems there hasn't been much progress on the zig archiver from the looks of the repo and the last comment made on this thread.
I'd like to take over this task with some possible mentorship as I've never written an archiver.
Is that feasible for @andrewrk / @kubkon or any other individual knowledgeable in archivers?
I am also interested in this task. I'd love to collaborate with some mentorship.
I've been actively working on it locally. Don't worry it's still going! Just slowly as I've not had a huge amount of free time recently.
However! If you are keen/interested in joining the effort that would be more than welcome :) do let me know if you are interested and I will spend a couple of weeks getting it back into a contributor-friendly shape.
Since we are putting a lot of effort into implementing our linker for all supported targets (https://github.com/ziglang/zig/issues/8726), we should also put some effort into adding our own implementation of a static archiver to replace
llvm-ar
. While in generalllvm-ar
is working well on the host platform when targeting the host platform, in cross-compilation settings things can get wonky when the archiver will produce native static archive headers for foreign file formats possibly tripping the linker upon trying to use it.This is a great issue for any new contributor as it allows you to create an archiver as a completely standalone program (in its own repo like zld for instance) and then upstream it into Zig once it's ready.
Also, with this issue closed, we will be able to offer
zig ar
as a subcommand that does not rely onllvm-ar
in any way.This issue does not block 1.0.