Time to breathe new life into the FatELF proposal?

volkertb commented 1 year ago

Hello @icculus (and anybody else reading this),

It was disappointing how your initial (well though-out) FatELF proposal didn't gain any traction at the time and kind of withered on the vine.

The frustrating thing is that Apple has already proven this approach to be the best one for smooth transitions between architectures from the perspective of the customers who just want to continue to run their software. Apple applied this concept three times, in the form of the "Fat Binary" format when they migrated from m68k to PowerPC, as a well as what they later called "Universal Apps" when they moved from PowerPC to x86, and then again during their currently still ongoing transition to Apple Silicon (ARM64). (It's even four times, when you also count their transition from 32-bit x86 to x86_64!) This is a well-proven concept now, and it's kind of crazy that Microsoft hasn't developed something similar in their PE executable format.

Going back to the lack of Linux adoption at the time: perhaps the time just wasn't right for it yet. Back in 2009, the x86_64 architecture reigned supreme, and it provided native backwards compatibility with 32-bit x86 (IA-32) code, negating the need for something like FatELF. The iPhone had been on the market for just 2 years. It was just 1 year after Android was introduced. Mobile gaming, although existent, was a much smaller slice of the market.

Obviously, a lot has changed over the last 14 years. With the rapid emergence of both the ARM64/AArch64 and RISC-V architectures lately, perhaps the time is now right to dust off this idea and resubmit FatELF for consideration.

I believe that the backing of at least one prominent organization in the Linux space would be enough for it to be widely adopted by the Linux kernel developers as well as the major distros. Two notable companies that I believe would be interested in championing FatELF would be the Raspberry Pi Foundation and Valve.

The Raspberry Pi Foundation has been very careful to maintain downwards compatibility in Raspberry Pi releases, going back to the original Raspberry Pi that had a very outdated ARMv6 CPU. They now have to maintain separate 32-bit and 64-bit versions of Raspberry Pi OS. Having the bulk of the .deb files built in a FatELF binary format with support for both 32-bit and 64-bit ARM would alleviate things for them considerably, I would reckon. Also, given that they are a Strategic Member of the RISC-V Foundation, they must already be thinking about how to manage a possible transition from ARM to RISC-V at some point in the future, or at least how to maintain Raspberry PI OS for two completely different and mutually incompatible instruction set architectures.

Then there is Valve, which has had a lot of success recently on the hardware front in the form of the Steam Deck. Granted, that portable game console is built around an AMD64/x86_64 SoC for out-of-the-box compatibility with the vast existing library of x86 games, but given the rising popularity of mobile phones as a gaming platform, Apple moving to ARM, and a continuing consumer demand for ever longer battery life and the best possible price/performance ratio, I cannot image Valve not thinking about wanting to hedge their bets between processor architectures in the long term. The x86 architecture will not be dominant forever.

And perhaps Google would also like to give FatELF another look for use in Android phones and Chromebooks, especially since they recently started added RISC-V support to the Android code base. With FatELF, the number of compatible apps would likely grow a lot quicker.

Once development toolchains get mature support for FatELF, publishers of games and other software will be able to provide support for multiple CPU architectures practically for free. Basically, all they would have to do is add runners with different CPU architectures to their CI/CD pipelines. QA engineers (presumably) already spin up the test builds on a wide range of different hardware, so they might as well include some systems with diverse architectures, just to double-check that everything is functional.

I've also read about how the AppImage developers would very much like to see FatELF being embraced and adopted the Linux ecosystem, so multi-arch AppImages would become feasible.

Icculus, I know you may personally have been burned out by the earlier rejection of your FatELF proposal, especially after you had put so much effort into it earlier. But as I said, a lot has changed over the last decade, and the circumstances today are quite different and, I believe, much more amenable to an idea like this these days.

I believe it might be a good time to resurrect the FatELF idea. At the very least, why not ask around the community, raise some buzz over it and see what happens? I'd be happy to help out with that. :slightly_smiling_face:

ell1e commented 12 months ago

For what it's worth, I've been trying a while ago to make the Fedora Linux project aware of it: https://bugzilla.redhat.com/show_bug.cgi?id=1925969 But it didn't really go anywhere. I like FatELF simply for the usability side, not even the platform does architecture move situation. Pointing a user to just "the Linux version" of some app on the website and it works everywhere sounds amazing to me.

icculus commented 12 months ago

The people that make decisions hated this idea so much. Sometimes I reread the thread on the kernel mailing list and just wince at how badly it went over.

The glibc maintainers at the time said they would never support it, either, which means even if you got it into the kernel to load FatELF binaries, you still couldn't load FatELF shared libraries, either linked to the binary or with dlopen(). It really needs to land in both at the same time.

That being said, a significant event has happened since then: Apple's patents on universal binaries expired. That was a major roadblock at the time.

And hey, glibc is under new management.

But I'm 100% not interested in going through this again. I spent a ton of time and effort on an idea that all my heroes lined up to shit on, and dozens of tech sites wrote articles about it, with comment sections full to the brim about how dumb I am. I put my finger on a hot stove and the experience taught me not to do that in the future.

If there were some indication somewhere that there was a willingness to include it in the kernel or glibc, I would update the patches, but I don't foresee that ever happening.

ell1e commented 12 months ago

That's a shame :disappointed: sorry that you had to go through that, that sounds pretty awful. :people_hugging:

ell1e commented 12 months ago

For what it's worth, in the last days I read a lot of the counter-arguments and for anyone who ever picks this up in the future and finds this useful, here is why I think FatELF is good, actually:tm: and no warranty for accuracy of course: (Important: I really don't mean to pressure Ryan here, rather in case someone else wants to pick up this project one day, maybe any of the following will be useful to discussing the merits at that point.)

First, why commonly suggested FatELF replacements don't quite cut it, in my opinion:

Distribution package managers don't replace FatELF fully. The approach of distribution packaging is a bad fit for smaller more niche apps that are innovative, because these update a lot and at the same time don't have much attention of distribution packagers because they're not popular enough. This means if it's found in any distributions at all, it won't be found in all of them, and it will often be too outdated to be useful if it's a fast-moving project. This means even if there is a distribution packaged version, users might have a horrible experience of not being able to open shared files, not being able to use shared plugins, or whatever is part of the app's ecosystem properly because their version is too old. (I'm not arguing against all distribution package management, clearly it works well enough for many apps and uses, just that it isn't great for every use case.)
A self-extracting shell script containing all architecture binaries inside is a bad FatELF replacement for everything but installers. I saw the self-extracting shell scripts proposed a lot, especially since it works in userspace only, and while that might sound like a great idea at first especially if you assume installers, in practice it might actually be horrible for many situations and especially outside of installers:
1. First of all, it breaks all inspection utilities. No more nm, no more readelf, no more file, no more direct linking if it's a library. This may seem minor, but if this approach ever became more popular (and with the rise of ARM64 and RISC-V it might!) this might quickly grow to be a huge nuisance and an actual problem.
2. You may be unable to run the program anymore when the disk space is full, since self-extract may fail. This may also seem minor at first, but it might be an unacceptable death blow for any more crucial low-level system utilities, and even if you say "well don't use it for that then", it's easy to imagine e.g. a more involved graphical disk usage browser app that is very convenient but not packaged, that now the user can't run to check where the disk space is taken up. You just get a ton of weird corner cases where needlessly you now can't run helpful third-party tools anymore in situations that aren't very transparent or understandable to the user.
3. It will take longer to launch bigger programs, since self-extract will typically require copying the entire binary at least once on top of potential decompression processing. For e.g. an AppImage meant to be launched directly of some bigger program (imagine for example a nightly build of some bigger open-source game, if you don't want to accept commercial apps like Photoshop as worth discussing) this will be both confusing and unexpected to the user, as well as simply an entirely avoidable nuisance since FatELF wouldn't have this problem.
4. The longer launch might add up for smaller utilities if they need to be launched a lot. This also might seem minor at first, but can you imagine something like less or cat would need to write its entire own binary to disk first every time you launch it? The problem is depending on what the tool is, this can again add up quickly and be a bigger issue. And while you could say again "then don't do that for these tools", FatELF just wouldn't have this problem.
5. Last but not least, I find the idea that "self-extract is like an installer and non-trivial programs need installers anyway, so what's the problem?" kind of archaic. Flatpak basically just downloads the equivalent of an AppImage except scaled up a little and launches the app directly, there is at least to my knowledge no real app-specific install. Same if you distribute actual AppImages. The notion that nowadays non-trivial programs naturally need an installer seems outdated.
Building from source is a bad replacement for FatELF. I mean this one should be more obvious, but most regular users don't want to build from source these days. It's a great option but not a great requirement for running an app on a platform it wasn't distribution-packaged for.
Flatpak is sometimes a not-that-optimal replacement for FatELF. Flatpak comes the closest to making FatELF kind of unnecessary, but it still can be a considerably worse solution in my opinion to FatELF in a few situations:
1. First, Flatpak is easy and convenient for the developer, rather failsafe given how much it bundles to ensure a working environment, but it errs much on the side of overengineered and using power tools when you sometimes just need a hammer. Especially for simpler command line programs where producing a binary that might actually run on the vast majority of Linux systems without hiccups is easier, flatpak is a pretty giant and complicated solution for a sometimes(!) relatively simple problem.
2. Flatpak may waste tons of disk space. Even though the container environments can be shared, I can confirm in practice as a person using flatpak a lot as an end user, that in practice usually most of them happen to want a slightly different version so tons of things need to be downloaded separately anyway, leading to tons of used up space. An AppImage can be way smaller for example if created by someone who knows what they're doing and it's well-tested, but an AppImage would then again not run on your local computer's architecture "automagically" unless, well, you combined it with FatELF.
3. Flatpak wastes a tons of time installing and updating. Goes back to the disk space point, simply due to the giant container environments needing to be downloaded that usually have way way more in them than the app actually would need to run, a lot of extra bandwidth and time is needed to keep these environments up to date. With FatELF you can make slimmer AppImages or otherwise portable binaries that might often be considerably smaller to download even with multiple architectures included.
4. Flatpak is kind of unsuitable for small command line utilities too, because even if you disable sandboxing, the utility doesn't just land in /usr/bin or the like, and it can be very confusing even for a more knowledgeable Linux person to figure out how to launch it in a terminal. FaELF wouldn't have this issue.
5. Flatpak makes changing special config files or other filesystem level tweaks for apps considerably more difficult, because the app is now sealed away in a separate inner filesystem which you need to first locate to do anything. You could say this is worth it, but that stops being the case once the app developer already offers an AppImage build that with FatELF could as easily be run as clicking "Install" in some flatpak frontend. It's just an entirely avoidable inconvenience with something like FatELF.

Now on to things FatELF was blamed not to solve, and why I personally think that's not too relevant:

I saw claims FatELF doesn't solve the problem of cross compile toolchains. That's correct, but it still provides an easier way once you actually tamed the cross compile toolchains to stitch together a universal binary that can be downloaded and run, once you actually managed to produce it. And it would give more incentive to solve the cross compile toolchain problem in the future.
I saw claims it's too hard to produce a portable binary anyway. In short, with AppImage, I beg to differ, and it also depends on the program. In long:

There is at least for C programs, or programs interacting with C interfaces, as in, not C++, a relatively foolproof way of reaching most systems that at least for some subset of programs is usable:
1. get a build environment with the oldest glibc you still want to support as a container or VM (this is a part of why it doesn't work well for C++ because now you're using older compilers, and C++ is fast moving so you'll be cut off from newer language features),
2. build and link everything statically that isn't very system specific infrastructure like all higher abstraction libs like SDL2 and all format handling libs like libpng but not wayland and X11 client libs for example,
3. avoid libraries like GTK+ or Qt that are known to cause problems or use them via dlopen (I know this point doesn't work for many apps, but there's a surprising amount of alternate and more portable toolkits being made these days, and keep in mind flatpak still exists for where this isn't feasible! Also, it's possible to accept that your portable binary won't run on all systems but at least on a majority of them),
4. test your resulting binary well,
5. profit.
The thing is, yes, this doesn't work for everyone, but it does work for some projects and for those, FatELF would be very useful. Many modern programming languages also make it easier to make such portable binaries anyway, look at Google's Go and syncthing as an example.
I saw mentions FatELF is somehow not needed for open-source anyway, you can just rebuild things. See above for why building from source doesn't replace FatELF.
I saw mentions that the complexity of parsing a complicated binary format shouldn't be in the kernel. To that I would say, the complexity of parsing a complex binary format named "ELF" already is in the kernel. It doesn't personally seem to me like FatELF adds such an unfathomable amount of complexity that it couldn't be considered at least.

Here is what FatELF does seem to solve nicely:

If you already can build portable binaries in some way (AppImage) that you're confident they run on most systems, and you're a fast-moving app (open-source or not), you can now provide a universal "Linux" download on your project's website and users can just download, fire up & forget and it works. A good for example for this is indie games or utilities like syncthing. It gets more difficult for apps using GTK+ and Qt, but not all apps do that.
Once ARM64 and RISC-V become more popular for regular end user desktops, which seems like a not too unlikely future, yeah it's true that distribution packagers and flatpak would make a lot of the ecosystem transition already mostly transparent. However, for the more niche programs outside of that maybe left behind, FatELF can be of considerable help, and it can also help distribution packagers simplify the disk layout a lot and make their different live disk install images way less different and therefore possibly easier to maintain.
All the tooling like readelf, gcc, linking, etc will just keep working as expected with FatELF. With self-extract wrappers for example, this is currently not the case.
It also seems to be where both Windows and macOS are heading.

My apologies if I got some things wrong, I probably did, or sorry if you don't agree (whoever is reading). This is just how things look like from my perspective as some random app developer. But yeah, summed up, I think FatELF wouldn't solve everything but it would be a neat addition, but maybe that's just me.

volkertb commented 11 months ago

Like @ell1e said, it's realy unfair that you were dragged through the mud for such a cool idea. I remember reading about your FatELF proposal back in the day, and being enthusiastic about it from the start, and disappointed when it failed to gain any traction, even without being aware of how toxic the responses in the mailing lists were.

Daring to touch the architectural foundations of the Linux ecosystem was bound to be controversial, but you didn't deserve to get crapped on for your efforts. Boldness is an important part of innovation.

I remember the main developer of glibc at the time referring to the ARM architecture as "embedded crap" and refusing to admit fixes for it. It became so bad that glibc was temporarily forked for the sake of better compatibility with other CPU architectures, as well as to make for a friendly developer culture. This happened in the same year when you proposed FatELF, so that obviously didn't help. Thankfully, glibc is "under new management", as you say, which has resulted in a much friendlier and positive development culture. A culture that would perhaps be at least willing to seriously consider a concept like FatELF, purely on its merits.

And with the various announcements in the news just last week alone (Qualcomm's Snapdragon Elite X being unveiled, Apple's new M3 chip, both NVIDA and AMD planning to develop ARM-based desktop CPUs), now could be the perfect time to give FatELF another fresh look.

One person or a handful of people pushing for a major change like this wouldn't work, of course. Again, having at least one major industry player throwing its weight behind the idea, that might help FatELF succeed. I mentioned Valve and Raspberry Pi, but other companies could help out here as well. Raptor Computing Systems and Ampere Computing also come to mind, for instance.

Time to start lobbying and inquiring among the various developer communities and companies.

icculus / fatelf

Time to breathe new life into the FatELF proposal? #1