Open maxirmx opened 6 months ago
move Ruby-Windwos specific code to a separate library. Now it is isolated into separate source file(s) linked to the same library as portable code. [I do not think it is critical though if Julia version of the library on Windows includes some unused Ruby-specific code]
extend exiting reusable Posix-compliant or OS-specific code with additional functions if required. Exisiting library grew this way when new platforms and Ruby versions were added.
create language-specific library if any other language uses approach similar to Ruby
Supporting of additional Ruby version means two tasks:
It would be nice to do patching more modular but I do not see how to achieve it.
Supporting of additional Ruby version means two tasks:
It would be nice to do patching more modular but I do not see how to achieve it.
An "individual runtime" here is an interpreter for a particular language at a particular version. e.g. Ruby 3.1, Ruby 3.2, Julia 1.10, Julia 1.09.
Benefits
asdf
or rbenv
or jenv
etc.To extend Tebako to support other language runtimes like Julia, we need to make several modifications to the existing system. Here's a high-level approach to achieve this:
Let's go through these steps:
module Tebako
class RuntimeRepository
def initialize(path)
@path = path
@metadata = {}
load_metadata
end
def add_runtime(language, version, os, arch, file_path)
runtime_key = "#{language}-#{version}-#{os}-#{arch}"
# ... (similar to previous implementation)
end
def get_runtime(language, version, os, arch)
runtime_key = "#{language}-#{version}-#{os}-#{arch}"
# ... (similar to previous implementation)
end
# ... (other methods)
end
class RuntimeBuilder
def initialize(repository)
@repository = repository
end
def build_runtime(language, version, os, arch)
builder = get_language_specific_builder(language)
builder.build(version, os, arch)
end
private
def get_language_specific_builder(language)
case language
when 'ruby'
RubyRuntimeBuilder.new(@repository)
when 'julia'
JuliaRuntimeBuilder.new(@repository)
else
raise "Unsupported language: #{language}"
end
end
end
class RuntimeManager
def initialize(local_repo_path, remote_repo_url = nil)
@local_repo = RuntimeRepository.new(local_repo_path)
@remote_repo_url = remote_repo_url
end
def ensure_runtime(language, version, os, arch)
runtime_path = @local_repo.get_runtime(language, version, os, arch)
return runtime_path if runtime_path
if @remote_repo_url
download_runtime(language, version, os, arch)
else
build_runtime(language, version, os, arch)
end
end
# ... (other methods)
end
end
module Tebako
class RubyRuntimeBuilder
def initialize(repository)
@repository = repository
end
def build(version, os, arch)
# ... (existing Ruby build process)
end
private
def apply_tebako_patches(version)
# ... (Ruby-specific patches)
end
end
class JuliaRuntimeBuilder
def initialize(repository)
@repository = repository
end
def build(version, os, arch)
puts "Building Julia #{version} for #{os} (#{arch})..."
# Clone Julia source
system("git clone https://github.com/JuliaLang/julia.git -b v#{version} julia-#{version}")
# Apply Tebako patches
apply_tebako_patches(version)
# Build Julia
Dir.chdir("julia-#{version}") do
system("make -j#{Etc.nprocessors}")
system("make install prefix=#{Dir.pwd}/install")
end
# Package the built Julia
output_file = "julia-#{version}-#{os}-#{arch}.tar.gz"
system("tar -czf #{output_file} -C julia-#{version}/install .")
# Add to repository
@repository.add_runtime('julia', version, os, arch, output_file)
puts "Julia #{version} for #{os} (#{arch}) built and added to repository."
end
private
def apply_tebako_patches(version)
# Apply necessary Tebako patches for Julia
puts "Applying Tebako patches for Julia #{version}..."
# Implement Julia-specific patches here
end
end
end
Modify the Tebako::Packager
class to handle different languages:
module Tebako
class Packager
def package(config)
language = config['language']
version = config['version']
# ... (other configuration options)
runtime_manager = RuntimeManager.new(LOCAL_REPO_PATH, REMOTE_REPO_URL)
runtime_path = runtime_manager.ensure_runtime(language, version, os, arch)
# Package the application with the appropriate runtime
# ... (packaging logic)
end
end
class Executor
def execute(package_path)
metadata = load_metadata(package_path)
language = metadata['language']
version = metadata['version']
# ... (other metadata)
runtime_manager = RuntimeManager.new(LOCAL_REPO_PATH, REMOTE_REPO_URL)
runtime_path = runtime_manager.ensure_runtime(language, version, os, arch)
# Execute the package with the appropriate runtime
case language
when 'ruby'
system("#{runtime_path}/bin/ruby", package_path)
when 'julia'
system("#{runtime_path}/bin/julia", package_path)
else
raise "Unsupported language: #{language}"
end
end
end
end
Update the tebako.yaml
format to include the language specification:
language: julia
version: 1.6.3
entry_point: main.jl
# ... (other configuration options)
Modify the build_runtimes.yml
to build both Ruby and Julia runtimes:
name: Build and Release Tebako Runtimes
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:
jobs:
build:
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
include:
- os: ubuntu-latest
arch: x86_64
- os: macos-latest
arch: x86_64
- os: windows-latest
arch: x86_64
language: [ruby, julia]
include:
- language: ruby
versions: ['3.1.3', '3.2.4', '3.3.3']
- language: julia
versions: ['1.6.3', '1.7.2']
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.0' # Use a stable version for building
- name: Build Tebako Runtime
run: |
ruby -r ./lib/tebako/runtime_manager -e "
repo = Tebako::RuntimeRepository.new('runtimes')
builder = Tebako::RuntimeBuilder.new(repo)
'${{ matrix.versions }}'.split(',').each do |version|
builder.build_runtime('${{ matrix.language }}', version, '${{ runner.os }}', '${{ matrix.arch }}')
end
"
# ... (rest of the workflow remains similar)
These changes allow Tebako to support multiple language runtimes, including Julia. The system is now more flexible and can be extended to support additional languages in the future by adding new language-specific builders and updating the configuration and execution processes accordingly.
Tebako could be extended to support various interpretive languages that require a runtime environment. Some languages that could be relatively easily added to Tebako in the future include:
Python: A widely-used language with a large ecosystem of libraries.
Node.js: For JavaScript and TypeScript applications.
Lua: A lightweight scripting language often used in game development and embedded systems.
R: Popular for statistical computing and data analysis.
Perl: Still used in many legacy systems and for text processing.
PHP: Commonly used for web development.
Tcl: Used in networking, embedded systems, and testing.
Erlang/Elixir: For building scalable and fault-tolerant systems.
Groovy: A dynamic language for the Java platform.
Scala: Another JVM language, combining object-oriented and functional programming.
Haskell: A purely functional programming language.
OCaml: A multi-paradigm programming language.
Go: While typically compiled, its runtime could be packaged for certain use cases.
Racket: A general-purpose, multi-paradigm programming language in the Lisp/Scheme family.
Dart: Used for web, mobile, and desktop application development.
To add these languages, you would need to:
PythonRuntimeBuilder
, NodeRuntimeBuilder
, etc.).RuntimeBuilder
class to support the new language.Packager
and Executor
classes to handle the new language.Here's a sketch of how you might add Python support:
module Tebako
class PythonRuntimeBuilder
def initialize(repository)
@repository = repository
end
def build(version, os, arch)
puts "Building Python #{version} for #{os} (#{arch})..."
# Clone Python source
system("git clone https://github.com/python/cpython.git -b v#{version} python-#{version}")
# Apply Tebako patches
apply_tebako_patches(version)
# Build Python
Dir.chdir("python-#{version}") do
system("./configure --prefix=#{Dir.pwd}/install")
system("make -j#{Etc.nprocessors}")
system("make install")
end
# Package the built Python
output_file = "python-#{version}-#{os}-#{arch}.tar.gz"
system("tar -czf #{output_file} -C python-#{version}/install .")
# Add to repository
@repository.add_runtime('python', version, os, arch, output_file)
puts "Python #{version} for #{os} (#{arch}) built and added to repository."
end
private
def apply_tebako_patches(version)
puts "Applying Tebako patches for Python #{version}..."
# Implement Python-specific patches here
end
end
end
Then, update the RuntimeBuilder
:
class RuntimeBuilder
# ...
private
def get_language_specific_builder(language)
case language
when 'ruby'
RubyRuntimeBuilder.new(@repository)
when 'julia'
JuliaRuntimeBuilder.new(@repository)
when 'python'
PythonRuntimeBuilder.new(@repository)
# Add more languages here
else
raise "Unsupported language: #{language}"
end
end
end
And update the Executor
:
class Executor
def execute(package_path)
# ...
case language
when 'ruby'
system("#{runtime_path}/bin/ruby", package_path)
when 'julia'
system("#{runtime_path}/bin/julia", package_path)
when 'python'
system("#{runtime_path}/bin/python", package_path)
# Add more languages here
else
raise "Unsupported language: #{language}"
end
end
end
By following this pattern, Tebako can be extended to support a wide range of interpretive languages, making it a versatile tool for packaging and distributing applications written in various programming languages.
To integrate Julia with Tebako using libdwarfs for a memory-based file system, we'll need to make several modifications to the Julia interpreter. Here's an overview of the changes needed and how to manage these patches:
File System Redirections: Julia's file system operations need to be redirected to use libdwarfs when accessing files within the packaged application. This primarily involves modifying Julia's I/O subsystem.
Key areas to patch:
Memory Mapping: Julia uses memory mapping for efficient file access. We need to modify this to work with the in-memory file system provided by libdwarfs.
Key area to patch:
Module Loading: Julia's module system needs to be aware of the in-memory file system for loading packages and modules.
Key area to patch:
Standard Library Adjustments: Some parts of Julia's standard library that interact directly with the file system may need modifications.
Key areas to patch:
Initialization: We need to initialize the libdwarfs system when Julia starts up, before any file operations occur.
Key area to patch:
Here's how we can manage these patches:
Create a Patch Directory: In the Tebako project, create a directory structure like:
patches/
julia/
v1.6.3/
v1.7.2/
...
Create Patch Files: For each Julia version, create separate patch files for each area of modification. For example:
patches/julia/v1.6.3/
01-jl_uv.patch
02-sys.patch
03-init.patch
04-mmap.patch
05-toplevel.patch
06-stdlib.patch
Patch Application Script:
Create a script that applies these patches during the Julia build process in the JuliaRuntimeBuilder
:
def apply_tebako_patches(version)
patch_dir = File.join(PATCH_DIR, "julia", "v#{version}")
Dir.glob(File.join(patch_dir, "*.patch")).sort.each do |patch_file|
system("patch -p1 < #{patch_file}")
end
end
Version Control: Keep these patches under version control in the Tebako repository. This allows for easy management of different patches for different Julia versions.
Patch Maintenance: As new versions of Julia are released, review and update the patches as necessary. You may need to create new patch sets for major Julia versions.
Documentation: Maintain documentation explaining each patch, why it's necessary, and any potential implications for Julia's behavior.
Example of a patch (simplified) for src/jl_uv.c
:
--- a/src/jl_uv.c
+++ b/src/jl_uv.c
@@ -100,6 +100,7 @@ int jl_fs_open(const char *path, int flags, int mode)
{
+ if (is_tebako_path(path)) {
+ return tebako_open(path, flags, mode);
+ }
uv_fs_t req;
int r = uv_fs_open(NULL, &req, path, flags, mode, NULL);
uv_fs_req_cleanup(&req);
return r;
}
This patch checks if the path is within the Tebako filesystem and redirects to a Tebako-specific open function if so.
By managing patches this way, you can:
To patch Julia to use the memory-based file system provided by libdwarfs, we need to focus on several key areas of the Julia codebase. Here are the specific patches needed, organized by the main components that require modification:
This file handles low-level file system operations using libuv. We need to intercept these calls and redirect them to libdwarfs when appropriate.
--- a/src/jl_uv.c
+++ b/src/jl_uv.c
@@ -1,5 +1,7 @@
#include <uv.h>
#include "julia.h"
+#include "tebako/tebako-io.h"
+#include "tebako/tebako-fs.h"
int jl_fs_open(const char *path, int flags, int mode)
{
+ if (within_tebako_memfs(path)) {
+ return tebako_open(path, flags, mode);
+ }
uv_fs_t req;
int r = uv_fs_open(NULL, &req, path, flags, mode, NULL);
uv_fs_req_cleanup(&req);
return r;
}
ssize_t jl_fs_read(int fd, char *data, size_t len)
{
+ if (is_tebako_file_descriptor(fd)) {
+ return tebako_read(fd, data, len);
+ }
uv_fs_t req;
ssize_t r = uv_fs_read(NULL, &req, fd, &uv_buf_init(data, len), 1, -1, NULL);
uv_fs_req_cleanup(&req);
return r;
}
// Similar modifications for jl_fs_write, jl_fs_close, jl_fs_stat, etc.
Julia uses memory mapping for efficient file access. We need to modify this to work with the in-memory file system.
--- a/src/mmap.c
+++ b/src/mmap.c
@@ -1,4 +1,6 @@
#include "julia.h"
+#include "tebako/tebako-io.h"
+#include "tebako/tebako-fs.h"
void *jl_mmap(void *addr, size_t len, int prot, int flags, int fd, off_t offset)
{
+ if (is_tebako_file_descriptor(fd)) {
+ return tebako_mmap(addr, len, prot, flags, fd, offset);
+ }
return mmap(addr, len, prot, flags, fd, offset);
}
int jl_munmap(void *addr, size_t len)
{
+ if (within_tebako_memfs(addr)) {
+ return tebako_munmap(addr, len);
+ }
return munmap(addr, len);
}
Modify the module loading system to be aware of the in-memory file system.
--- a/src/toplevel.c
+++ b/src/toplevel.c
@@ -1,4 +1,6 @@
#include "julia.h"
+#include "tebako/tebako-io.h"
+#include "tebako/tebako-fs.h"
jl_value_t *jl_load_file_string(const char *text, size_t len, char *filename)
{
+ if (within_tebako_memfs(filename)) {
+ // Use tebako functions to read the file content
+ char *tebako_content = tebako_read_file(filename, &len);
+ if (tebako_content) {
+ jl_value_t *result = jl_parse_input_line(tebako_content, len, filename, 0);
+ free(tebako_content);
+ return result;
+ }
+ }
// Existing implementation for non-tebako files
}
Initialize the libdwarfs system when Julia starts up.
--- a/src/init.c
+++ b/src/init.c
@@ -1,4 +1,6 @@
#include "julia.h"
+#include "tebako/tebako-io.h"
+#include "tebako/tebako-fs.h"
void jl_init(void)
{
+ // Initialize tebako/libdwarfs
+ tebako_init();
+
// Existing initialization code
}
void jl_cleanup(void)
{
+ // Cleanup tebako/libdwarfs
+ tebako_cleanup();
+
// Existing cleanup code
}
Modify Julia's standard library functions that interact with the file system.
--- a/base/filesystem.jl
+++ b/base/filesystem.jl
@@ -1,8 +1,14 @@
# filesystem.jl
+import Tebako
+
function open(filename::AbstractString, mode::AbstractString="r")
+ if Tebako.within_memfs(filename)
+ return Tebako.open(filename, mode)
+ end
# Existing open implementation
end
function read(filename::AbstractString)
+ if Tebako.within_memfs(filename)
+ return Tebako.read(filename)
+ end
# Existing read implementation
end
# Similar modifications for other file operations
Create a new header file to define the interface between Julia and Tebako/libdwarfs.
// include/julia/tebako.h
#ifndef JULIA_TEBAKO_H
#define JULIA_TEBAKO_H
#include <stdbool.h>
#include <sys/types.h>
bool within_tebako_memfs(const char *path);
bool is_tebako_file_descriptor(int fd);
int tebako_open(const char *path, int flags, int mode);
ssize_t tebako_read(int fd, void *buf, size_t count);
ssize_t tebako_write(int fd, const void *buf, size_t count);
int tebako_close(int fd);
void *tebako_mmap(void *addr, size_t len, int prot, int flags, int fd, off_t offset);
int tebako_munmap(void *addr, size_t len);
void tebako_init(void);
void tebako_cleanup(void);
#endif // JULIA_TEBAKO_H
These patches provide the core modifications needed to integrate libdwarfs with Julia for accessing the memory-based file system. The key points are:
Remember that these patches are conceptual and may need to be adjusted based on the specific versions of Julia and libdwarfs you're working with. You'll also need to implement the Tebako-specific functions (like tebako_open
, tebako_read
, etc.) to interface with libdwarfs.
Testing these patches effectively is crucial to ensure that the integration of Tebako and libdwarfs with Julia works correctly and doesn't introduce any regressions. Here's a comprehensive approach to testing these patches:
Create unit tests for each modified function. These tests should cover both Tebako and non-Tebako paths.
Example for jl_fs_open
:
void test_jl_fs_open() {
// Test regular file open
int fd = jl_fs_open("/tmp/test.txt", O_RDONLY, 0);
assert(fd >= 0);
jl_fs_close(fd);
// Test Tebako file open
fd = jl_fs_open("/__tebako_memfs__/test.txt", O_RDONLY, 0);
assert(fd >= 0);
assert(is_tebako_file_descriptor(fd));
jl_fs_close(fd);
}
Create tests that exercise the entire stack, from Julia code down to the libdwarfs layer.
function test_file_operations()
# Write to a file in the Tebako filesystem
open("/__tebako_memfs__/test.txt", "w") do f
write(f, "Hello, Tebako!")
end
# Read from the file
content = read("/__tebako_memfs__/test.txt", String)
@assert content == "Hello, Tebako!"
# Test file existence
@assert isfile("/__tebako_memfs__/test.txt")
# Test directory operations
mkdir("/__tebako_memfs__/testdir")
@assert isdir("/__tebako_memfs__/testdir")
# Test file copy
cp("/__tebako_memfs__/test.txt", "/__tebako_memfs__/testdir/test_copy.txt")
@assert isfile("/__tebako_memfs__/testdir/test_copy.txt")
end
Compare the performance of file operations between the regular filesystem and the Tebako filesystem.
function benchmark_file_operations()
regular_time = @elapsed for i in 1:1000
open("/tmp/bench.txt", "w") do f
write(f, "Benchmark test")
end
content = read("/tmp/bench.txt", String)
end
tebako_time = @elapsed for i in 1:1000
open("/__tebako_memfs__/bench.txt", "w") do f
write(f, "Benchmark test")
end
content = read("/__tebako_memfs__/bench.txt", String)
end
println("Regular filesystem time: ", regular_time)
println("Tebako filesystem time: ", tebako_time)
end
Test various edge cases and error conditions:
Use tools like Valgrind to check for memory leaks, especially in the Tebako/libdwarfs integration code.
Create tests that put heavy load on the filesystem:
function stress_test()
for i in 1:10000
filename = "/__tebako_memfs__/stress_test_$i.txt"
open(filename, "w") do f
write(f, "Stress test content for file $i")
end
content = read(filename, String)
rm(filename)
end
end
Ensure that existing Julia packages that heavily rely on filesystem operations still work correctly with the Tebako integration.
Test the patches on all supported platforms (Linux, macOS, Windows) to ensure consistent behavior.
Create a suite of tests that cover all previously known bugs and edge cases to prevent regressions.
Set up CI pipelines to automatically run these tests on every commit and pull request.
name: Tebako Integration Tests
on: [push, pull_request]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
julia-version: ['1.6', '1.7', '1.8']
steps:
- uses: actions/checkout@v2
- name: Set up Julia
uses: julia-actions/setup-julia@v1
with:
version: ${{ matrix.julia-version }}
- name: Build Tebako-patched Julia
run: |
# Commands to apply patches and build Julia
- name: Run tests
run: |
julia --project=@. -e 'using Pkg; Pkg.test()'
Use fuzzing tools to generate random inputs and file operations to uncover potential bugs or crashes.
Create a comprehensive benchmark suite to compare performance across different scenarios and Julia versions.
To implement these tests effectively:
Test
module) for Julia-level tests.By implementing this comprehensive testing strategy, you can ensure that the Tebako patches are robust, performant, and don't introduce regressions in Julia's core functionality.
Current state of modularity
Existing tebako packager includes two compore modularity
libdwarfs
libdwarfs is a library that provides our implementation of IO functions used by Ruby. This implementation reroutes calls to memfs or host filesystem.
libdwarfs uses upstream dwars project. dwarfs provides filesystem driver API and libdwarfs implements application API atop. 'application API' is a subset of Posix API plus some OS-specific functions used by Ruby as mentioned above.
Reusability and extendability statement is correct for Liinux (gnu, musl) and MacOS implementation of tebako. Windows version of Ruby implements its own Posix compatibility layer so Windows version of libdwarfs includes a module that implements our version of Ruby Posix compatibility layer.
tebako
Existing tebako component implements Ruby code patching and drives tebako pac are skage builds and rebuilds Ruby patching is required to meet two objectives
Tebako itself is