Ruby packaging functionality

ronaldtse commented 2 years ago

This corresponds to items 2 and 3 here https://github.com/tamatebako/tebako/issues/1#issuecomment-902687931.

This is the first of potentially multiple "packagers", which extracts out a program as a single image to be run on Tebako.

There are several names we can consider for this "packager" activity:

imager (person who makes a image, which implies read-only and reflects what this role does)
presser (something like distillation, extracts the essence (only the program) into the Tebako image)

Considered these but in favour: "distiller" (already used by Adobe Acrobat), "packer" (used by ruby-packer), "packager" (sounds too generic), "cloner" (sounds a bit simplistic).

So this is the "Ruby imager" or "Ruby presser".

ronaldtse commented 2 years ago

We may want to maintain such programming language specific packaging code in separate repos, such as tebako-imager-ruby / tebako-presser-ruby.

ronaldtse commented 2 years ago

The user should be able to run something like this:

$ tebako press ruby entrypoint.rb
=> Image built at entrypoint.rb.dwarfs

maxirmx commented 2 years ago

@ronaldtse Could you please have a look at CLI section here: https://github.com/tamatebako/tebako/blob/master/doc/DEV-NOTES.md ? Am I moving in a right direсtion ?

ronaldtse commented 2 years ago

@maxirmx can you also have a look at https://github.com/larsch/ocra ? It is just like ruby-packer on Windows, but does not need to re-compile Ruby.

maxirmx commented 2 years ago

@maxirmx can you also have a look at https://github.com/larsch/ocra ? It is just like ruby-packer on Windows, but does not need to re-compile Ruby.

@ronaldtse ocra does the following Packaging: Creates an archieve that contains

ruby program to be packaged
all dependencies
ruby interpreter with required dynamic modules

Complies and links small stub program that includes the archieve described above as a custom data section

Execution: Runs the stub which extract archieve to temporary folder and calls ruby interpreter with appropriate parameters to execute packaged application

Very simple and effective. No need to patch and build ruby.

ronaldtse commented 2 years ago

So the only differences are:

Orca requires extracting the ruby interpreter and code into a temporary folder
i.e. does not require a file system redirection

Do note that packed-mn using Orca is failing right now due to a dependency that is used by Fontist:

https://github.com/metanorma/packed-mn/issues/118

I wonder if we could implement an FS redirect layer without patching Ruby but “wrapping” it. Because we don’t want to extract the dependencies.

ronaldtse commented 2 years ago

@maxirmx we have previously created retrace which allows interception of file system calls of unpatched executables. The same approach would work here.

maxirmx commented 2 years ago

I think now I understand the confusion in this discussion and in #1

Ocra packager loads the Ruby program to be packaged and builds the list of loaded dependencies, both native DLLs and ruby gems, then adds Ruby interpreter itseld to the package. Ruby-packer does not load anything. It creates clean Ruby installation and the runs gem install, bundle deploy or something similar and the packages the whole Ruby floder tree with everything installed there.

Ocra approach implies packager written in Ruby. Ruby-packer approach can be implemented in any language or with any applicable tool.

ronaldtse commented 2 years ago

Ruby-packer is actually composed of two parts:

hacking the Ruby interpreter's file functions to load from the internal disk (via libsquashfs)
a script written in Ruby that gathers Ruby dependencies, similar to Ocra. Ocra uses dynamic discovery of loaded gems via Gem.loaded_specs, while Ruby-packer does not "discover" gems but requires some static listing of gems to be included e.g. via Bundler's Gemfile (which loads all dependent gems).

The only real difference between Ocra and Ruby-packer is that Ocra requires an "unzip" of the package to local disk (some $tmpdir) before executing. Ruby-packer loads that payload via libsquashfs.

We don't want to unzip to disk for execution.

ronaldtse commented 2 years ago

From @maxirmx :

We have packager that implements adopted ruby-packer algorithm. The next planned step is Ruby patch that will remove fuse dependency.

I can adopt Ocra approach to replace or complement Ruby-packer algoritm if you like.

Ocra contains a lot of backwards compatibility code for Ruby 1.8 and 1.9 (e.g. Pathname), but many of those components can be removed in later Ruby versions. We do not need to support these versions, and even just starting with Ruby 3 would work.
Ocra does provide very good tests compared to Ruby-packer, so we should learn from them.
In practice, I think the Ocra approach or Ruby-packer gem packaging algorithm isn't that much a difference. For example, if we use Bundler, then Bundler would already do all the dependency checking with the results provided in Gemfile.lock. Bundler is now a default gem in Ruby, it is completely reasonable to ask someone to provide a Gemfile to package a Ruby program.
Ocra also has functionality to "strip out" some non-essential files from Gems, e.g. the doc/, spec/ folders, but I am less keen to remove them out of a packaged gem. The size saved from excluding such content is not expected to be exponential. The more important thing is if we modify a published gem, it becomes difficult to prove provenance (if the published gem has been signed or its SHA published). I don't know if people publish gem hashes online but it is a potential issue. If the space savings prove important, then we'd rather have a separate program convert "normal gem => minimal gem => build minimal gem" and have the packager include the minimal gem.
Ocra does do something that Ruby-packer doesn't, which is the discovery of loaded DLLs, and it includes them. Ruby-packer doesn't do that. So at least for Windows, we need to do that.

tamatebako / tebako

Ruby packaging functionality #13