rumpkernel / rumprun-packages

Ready-made packages of software for running on the Rumprun unikernel
Other
202 stars 80 forks source link

Flawed Architecture: The Code that the Patches are Applied to, is Expected to be Available at a Single URL #144

Open martinvahi opened 7 years ago

martinvahi commented 7 years ago

The Flaw/Bug

Currently the patching Makefiles seem to contain some mechanism for downloading the source from a single URL. For example, at least one of the 2017_08_18 versions of the libevent contains the following 5 lines:

UPSTREAM=https://github.com/libevent/libevent/releases/download/release-2.0.22-stable/libevent-2.0.22-stable.tar.gz
TARBALL=$(notdir $(UPSTREAM))

# ... some Makefile code omitted to make this citation shorter

dl/$(TARBALL):
    mkdir -p dl
    ../scripts/fetch.sh ${UPSTREAM} dl/$(TARBALL)

If, for whatever reason X, Y, Z the single URL is not available, the collection of rumprun-packages that is meant to be available over a longer timeperiod, becomes broken. If there are dependencies between the rumprun-packages, then the lack of a package at the "rootlike level" of the dependence tree makes the whole tree broken, unavailable.

Proposed System

A smarter solution is to describe the patchable code through some SHA256 or other secure hash of a tar file that contains the code. That way it does not matter, from where the code gets downloaded and different users can use "warez-like" file sharing networks for downloading the packages by using Bittorrent-like solutions. The packaging system should just include the size and secure hash of the tar-file that contains the patchable code.

A Simple Bootstrapping System

The central repository, at this day and age, the very Git repository that this bug report is attached to, should contain a plain text file, where there is at most one URL per line and those URL's refer to other text files all around the world, at the servers of different volunteers, who serve files out using plain http/https. The text files at the volunteers' servers (hereafter: local_list_of_files), contain RELATIVE FILE PATHS of the tar files that contain the patchable code.

Demo

My Declaration of Interests/Biases

I'm very biased by writing this bug report right now, because I have my own small project called Silktorrent, which ended up being my personal self-education endeavor and where I tried to develop base technology for totally censorship proof web applications, including software package distribution. Part of the Silktorrent use case is that the Internet is totally offline and people can exchange files only with USB-sticks, possibly with "mail-pigeon-drones" that transport the USB-sticks. The concept of defining files through their location, URL, does not make sense in that scenario and as it turns out, the P2P file sharing systems define files not by their location, URL, but through the secure hash of the file.

The end result is one efficiency wise terribly stupidly written Bash-and-Ruby script (archival copy, if no command line arguments are given, then the script prints usage instructions) that, despite all the "bloat", does essentially only one thing: it creates a tar-file and renames the tar-file according to its secure hash.

(Actually, the script can also "unpack" the tar files and verify the file name format without reading the file and salts the tar-file at its creation to make it possible to "package" the same "payload" to tar files that have different hashes, which forces the censors to download at least parts of the tar-files to see, whether the tar-file contains censored material. At some point the downloading and checking should overwhelm the censoring system.)

There's no need to use the Silktorrent script for the rumprun-packages, because probably a simpler tar-file creation and renaming implementation will work just fine, but I use my script for this demo.

The Core of the Demo

At some "central" location, quotes because it doesn't need to be, should not be, central, there is a list of URLs to text files that list relative file paths. An example (archival copy) of one such URL:

https://demodrome.softf1.com/rumpkernel_org/distribution_demo_001/list_of_relative_file_paths.txt

By combining the URL with relative file paths at the "list_of_relative_file_paths.txt", URLs of the tar-files can be derived.

Thank You for reading my comment.

anttikantee commented 7 years ago

It is impossible to have a flawed architecture when there is no architecture at all.

An architect's position is available. Run your scheme by the mailing list, and in all likelyhood, start pushing. The only requirement is that there may not be usability regressions (especially for non-developers).

martinvahi commented 7 years ago

Thank You for the answer and for the encouragement.

I'll need to write the downloader first, regardless of, whether the proposal
gets accepted or rejected later. What regards to usability regressions then could You please tell/write, what operating systems must the build scripts run on?

So far all of my code has been running only on Linux and BSD and it has been occasionally tested on CygWin, but the CygWin is not something that many people have available and even on Linux not all people have the newest Ruby installed.

The other issue with my proposed scheme is that it uses the nice, fast, secure hash console tools that are available on Linux and BSD, but which might not be available on a CygWin installation. Dependency tree completeness wise the most reliable solution that I'm aware of, is to create a VirtualBox appliance, because then everything can be pre-tested and installed, but the problem with VirtualBox appliances is that they are huge, in my case, about 22GiB minimum. In practice I ran into trouble with one of my clients, because the ~100GiB virtual appliance was difficult to download. Another, in my view, even more serious, issue with the VirtualBox appliances is that they assume the use of x86 CPU based hardware, but the x86 CPU-s boot non-Windows operating systems only due to the mercy of the Microsoft and the AMD and Intel also have microcode update capability, which sounds like a huge security hole to me. Add to that the issue that both, the AMD and the Intel have been making quite an
effort to keep the market free of other x86 manufacturers (the Via must have been an accident of the AMD/Intel lawyers) and the various non-x86 CPU-s are pretty much the future of the hardware that requires security and can be used for privacy respecting applications. x86 specific VirtualBox appliances will not run on non-x86 hardware, unless the hardware has some x86 mode like the former Crusoe CPUs and the Elbrus had

Build systems tend to be special purpose applications software that, like applications software, have their dependencies. The reason, why my Silktorrent script is such a slow monster is that I first wanted to make it as portable and "free-of-dependencies" as possible and thought that if I write it in the very old Bash, then it should be pretty "foolproof" in terms of lacking dependencies. After all, the core of it is that it just creates a tar-file and renames it. When I run into the checks and string processing code, then I thought that I'll use some very old, "standard", command line tools like the Awk and gawk, but that was a mistake, because it turned out that the BSD and Linux installations ave different "Awk-s" and due to the various Awk and commandline related quirks the Ruby code ended up being simpler and more portable than the Awk code, so the fast and quick-to-boot Awk calls were replaced with slow-to-boot Ruby interpreter launches and the slowness of the script comes from the re-initialization of the Ruby interpreter, which allocates about 40MiB of RAM at every start-up. The end result, the 2017_03 Silktorrent script, is essentially rigorously adhering to a speed optimization ANTI-pattern. My conclusion of that experience in the context of Rumpkernel.org "usability" is that the use of Bash or other "simple-and-light" tools leads to a heavy and unoptimized solution and it's smarter to build any build/test/make scripts by using something more capable, "heavy", from the very start. In my case the preferred language is Ruby, for which I have my own libraries (under the BSD license).

What regards to development methodology, then I believe (at least I prefer) a solution, where no person should be required to modify other person's code and within one's own code people use whatever they want, as long as the resource usage of their code fits the project specific limits. Code reviews are OK, depending on how they are carried out, including, how much freedom people are given, id est I can adhere to other people's style requirements, depending on what they are, but I certainly find it very stupid to require others to follow my style preferences. However, generally I do not believe in manual code reviews.