Open dralley opened 1 year ago
A real concern: Do we want to support rpms with cpio-like archives larger than 4GB? It feels like we pull in a lot of pain for supporting an antipattern? Are there use-cases that are idiomatic that require rpms larger than 4GB?
@drahnr The example that typically comes up is games, which often include many large assets, or ML models, or their training data. In practice those are rarely distributed as system packages but it is possible and has been done.
My question: Are we anticipating this crate being used for games, using rpm-rs
rather than rpmbuild
? Resources are limited, and this doesn't hit me as good return on investment of those.
It's not just a matter of writing but also reading. I'm not sure I want to assume that nobody will ever want to use this crate to process the contents of existing such RPMs.
I don't know that it's such a drain on resources. cpio
is pretty simple, the code for both reading and writing them is only about 400 lines excluding tests and is pretty stable.
Tbh, I'd prefer we create a separate rpm-cpio
in the org, rather than moving it into the codebase, and just replace the dependency. Does that sound fair? We can then go forward and rebase on any upstream changes as needed rather than having to backport code manually.
I also think that a separate crate would be a better approach. Maybe you should create a repository for it?
I'm a bit lukewarm on having a separate crate, because I can't think of anything apart from an RPM parser which would want to parse RPM payloads. So it would be a separate crate that we would be the only users of, probably ever.
I am mostly thinking operationally: applying upstream changes would be as easy as a git rebate or merge. I couldn't care less if we stay the only user if it simplifies the maintenence
I don't think there will be any maintenance, the library is "finished" and hasn't seen any commits in a year. CPIO is very simple so there are unlikely to be any bugs.
We haven't reached a conclusion here, my preference is still on forking to rpm-rs/cpio-rpm
and using that.
I still have the opposite preference, tbh :man_shrugging:. It's very difficult for me to imagine the supposed maintenance benefit repaying itself against having a separate crate which nobody but this particular library will ever use.
Since the new payload format removes nearly all of the metadata from the archive (because it's duplicated in the RPM header), you can do very little with the payload without also reading the RPM header. So the obvious thing to do is for us to just provide an API for that directly from this crate, since it would be pretty much the only useful way to use that code.
There is another development since we last had the discussion, which is that RPMv6 plans to use only the "new" payload scheme, so it won't be relegated to just packages with files >4gb anymore, it will eventually be all packages.
That is mentioned under the "Payload" section here: https://github.com/rpm-software-management/rpm/discussions/2374
See the "Payload" section of the website: https://rpm-software-management.github.io/rpm/manual/format.html
So, we should fork
cpio-rs
(providing the appropriate credits of course), strip it down to the subset we need, and change the magic bytes constant.Luckily the CPIO format is pretty simple and the library only a few hundred lines, so it's not a big deal.
Subsequently we need to change the
PAYLOADFORMAT
tag, but upstream RPM still usescpio
as the name, so we'll have to wait until they pick something.