vmware / database-stream-processor-compiler

Infrastructure to run programs written in high-level languages on top of the Database Stream Processor (DBSP) runtime.
Other
16 stars 2 forks source link

[RFC] DDlog Packages and Rust integration #10

Open ryzhyk opened 3 years ago

ryzhyk commented 3 years ago

[RFC] DDlog Packages and Rust integration

This RFC proposes a systematic way to organize DDlog code into packages and map these packages into Rust crates. As part of this, we propose a design for integrating native Rust code into a DDlog project.

Motivation

We address the following limitations of DDlog-1:

Packages

DDlog-2 code is organized in packages. Like a Rust crate, a package consists of a tree of modules and a metadata file specifying package dependencies. There are two types of modules: DDlog modules and native Rust modules. The DDlog-2 compiler converts the package into a Rust crate by generating Cargo.toml for the package and a Rust module module.rs for each DDlog module module.dl. Native Rust modules are included in the Rust project as is, and in place, so that Rust compiler messages point to actual source code locations.

Package structure

A DDlog package looks a lot like a Rust crate. The package.toml file in the root directory contains package metadata: name, version, description, etc., path to the main module (e.g., lib.dl), and dependencies. A package can have two kinds of dependencies: Rust crates and other DDlog packages. The former can point to crates.io, git repository, or local folder, the latter can initially only point to a local folder or a git repo, but it should be possible to implement support for both git repositories and for crates.io in the future (see below).

my_package/
├── package.toml
└── src/
    ├── lib.dl
    ├── mod1.dl
    ├── mod2.dl
    ├── mod2.rs
    └── mod3/
        └── mod.dl

A module can consist of a single file or a file tree. In the above example, mod1.dl is a single-file DDlog module, and mod2.rs is a single-file Rust module. mod2.dl contains DDlog bindings for Rust definitions in mod2.rs. This file can only contain function prototypes without implementation and type definitions (see discussion of extern types below). Ideally, this file should be generated automatically from mod2.rs, but we may want to leave this for future work.

Similar to lib and bin crates in Rust, we may want to distinguish library packages and executable packages, where only the latter can be used to instantiate a dataflow.

Generated code structure

In contrast to DDlog-1, we place the generated Rust code under the package directory. We generate Cargo.toml in the top-level folder. Each .dl module is compiled to a Rust module and stored in the src_rs directory that mirrors the module structure of the DDlog package. Native Rust modules remain unmodified at their original location. Rust's #[path] attribute is used to link the native modules to the generated Rust project:

my_package/
├── Cargo.toml
├── package.toml
├── src/
│   ├── lib.dl
│   ├── mod1.dl
│   ├── mod2.dl
│   ├── mod2.rs
│   └── mod3/
│       └── mod.dl
└── src_rs/
    ├── lib.rs
    ├── mod1.rs
    └── mod3/
        └── mod.rs

Native types and functions

As discussed above, a DDlog package can contain native modules implemented in Rust. A native module is accompanied by a .dl file that declares DDlog bindings for types, functions, and trait implementations exported by the Rust module, e.g.,

/// Function signature (no implementation).
pub fn f<T: Ord>(arg: T) -> bool;

/// `impl` block consisting of function signatures only.
impl MyStruct {
    fn f1(self) -> bool;
}

/// Trait `impl` without a body.
impl Ord for MyStruct;

/// OpaqueType can be declared in Rust as a struct, enum, or alias.
type OpaqueType;

We sometimes want to expose Rust types like Option and Result to DDlog not as opaque types but as structs or enums with constructors. In DDlog-1 we did so by re-declaring the types in DDlog and implementing conversion functions to/from std and ddlog_std versions of the type. This did not exactly improve Rust developers' experience.

We therefore propose that in DDlog-2 one can use Rust structs and enums directly by describing the structure of the struct or enum to the DDlog compiler. Consider the following native Rust module and accompanying DDlog binding that expose the Rust Option type to DDlog:

/// option.rs

// Re-export Rust `Option` type to DDlog
pub use std::option::Option;
/// option.dl

// Instead of declaring `Option` as an opaque type, tell DDlog about its constructors.
// DDlog knows that `module.dl` contains bindings for native Rust code in `module.rs` and
// will not generate a duplicate Rust definition for this type.
pub enum Option<T> {
    None,
    Some(T)
}

Distributing DDlog package via crates.io

One advantage of the proposed design is that DDlog packages map directly to Rust crates and can be distributed as such. This has dual benefits: DDlog developers can use the crates ecosystem for software distribution and dependency management; conversely, Rust developers can easily incorporate DDlog libraries in their programs. Since crates.io doesn't support DDlog package file format, one must include the generated Cargo.toml file in the distribution. The next question is whether we need to distribute generated Rust sources along with .dl files. This is probably undesirable and can be avoided by using a build.rs that invokes the DDlog compiler to convert DDlog sources to Rust at build time.

mihaibudiu commented 3 years ago

Frankly this proposal is great. If you can achieve all this, it's fabulous. But are there some obstacles which could make this difficult? You mentioned circular dependencies between modules. Do we have workarounds for such obstacles? Do we need circular module dependencies?

mihaibudiu commented 3 years ago

Adding traits to existing types is another one that comes to mind.

ryzhyk commented 3 years ago

You mentioned circular dependencies between modules.

Circular module dependencies are not a problem, as they are allowed by Rust. Circular crate dependencies on the other hand are illegal. Currently the compiler automatically splits the module dependency graph into SCCs to form crates. In DDlog-2 the programmer will be responsible for this, which I think is better in practice.

Adding traits to existing types is another one that comes to mind.

That's a good point. This will still require wrapper types, unless you control either the crate that declares the type of the crate that declares the trait.