tamatebako / tebako

Tebako: an executable packager (for Ruby programs)
https://www.tebako.org
40 stars 6 forks source link

Roadmap for tebako #45

Closed ronaldtse closed 2 years ago

ronaldtse commented 2 years ago

Some large work items that we need to address.

  1. Benchmarking against the ruby-packer approach
    • compute time (tebako architecture vs ruby-packer)
    • read/write time (due to DwarFS instead of SqashFS)
  2. Support new Ruby versions (Ruby 3.0)
  3. Generalizing the approach to other languages like Python (CPython), and Perl (which is a prominent example on https://github.com/mhx/dwarfs)
ronaldtse commented 2 years ago

From @maxirmx :

I see/feel the limitations of the approach taken. We patch consumer of the data. However, the consumer we patched can fork another consumer that will be not patched.

For example. We patch IO calls in Ruby, but Ruby can use FFI to load native libraries which may have unpached IO calls.

If we consider complex project it will be nice to have at least some tracing facilities which will advise about extensions that we can not support. (e.g. Retrace https://github.com/riboseinc/retrace)

ronaldtse commented 2 years ago

We also want to support threaded processing, i.e. concurrent access to the tebako memfs.

For example, ruby-packer fails (segfaults) on threads when there is more than one file IO on a single file (probably due to libsquashfs not being threadsafe). https://github.com/metanorma/packed-mn/issues/129

ronaldtse commented 2 years ago

Tebako is also useful in these cases:

We need to implement integrity checks on Tebako images prior to execution, or is this already handled in DwarFS?

GraalVM provides Sulong which emulates C/C++ runtimes on JVM. Perhaps there is something we can learn from there in proxying C/C++ integration.

ronaldtse commented 2 years ago

Tebako seems to provide a virtualised filesystem layer for interpreters implemented in C/C++ with redirection capabilities to a MemFS.

Potentially, because of the MemFS aspect (if we load the DwarFS image into memory), perhaps code would run faster on at least the first startup?

ronaldtse commented 2 years ago

What do we do if we want to share say, a Ruby interpreter across two executables? e.g. multiple bin/ files (e.g. Ruby scripts) that use the same interpreter (or code). Duplicating the whole setup for 2 executables is definitely wasteful in terms of storage space (and the build process).

ronaldtse commented 2 years ago

The best way is certainly to have a single executable or installable package for Tebako. This will mean that we move away from CMake in the "press" process, i.e. DwarFS usage and incbin.

In compiling Tebako Ruby, we certainly need the full development chain.

However, in the "press" process, we only need sufficient tooling to:

  1. Create the DwarFS image
  2. Linking incbin
  3. Building the target Ruby application (the native gems)
  4. Compiling the final executable
ronaldtse commented 2 years ago

@maxirmx can we close this now? Thanks.

maxirmx commented 2 years ago

As far as understand this issue was created in order not to forget some major tasks. We can probably close it and create several others

maxirmx commented 2 years ago

There shall be some policy for Tebako regarding Ruby versions to be supported

If we follow Ruby stable releases (https://www.ruby-lang.org/en/downloads/) today(Feb 15th, 2022) we shall support 2.6.9 2.7.5 3.0.3 3.1.0

maxirmx commented 2 years ago

Tebako benchmarking Tebako needs benchmarking environment that can be run against the ruby-packer and okra compute time (tebako architecture vs ruby-packer) read/write time (due to DwarFS instead of SqashFS)

and although to support improvements like #74