vmware / differential-datalog

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
MIT License
1.38k stars 119 forks source link

Cleanup the Rust API #612

Open ryzhyk opened 4 years ago

ryzhyk commented 4 years ago

Even though Rust API is supposed to be the native one for DDlog, it is currently difficult to use, as it requires navigating multiple static and generated source files, managing two layers of wrappers around each value (x -> Value::XXX(x) -> DDValue), plus it lacks documentation.

TODOs:

d4hines commented 4 years ago

Hi @ryzhyk. We met briefly at the Datalog 2.0 conference last year. I'm very excited to start building on your work both at my day job and as part of my academic research. DDLog is awesome, and I appreciate the considerable time spent on documentation and polish.

That said, I'm a bit stumped on how to use the Rust API. Are there any examples I can copy? I'm new to Rust, but, beyond that, it's not clear which of the many generated functions I need to call to get this thing working.

mihaibudiu commented 4 years ago

When you say "working" what do you really mean? You should start by running the tests - there are lots of them, and each of them is run in a different way unfortunately. Probably the tests in test/datalog_tests are the most relevant for learning the Rust API. Try for example

cd test/datalog_tests
./run-test.sh simple release

This will run the ddlog compiler, compile the resulting rust (in directory simple_ddlog) and the run the produced program by feeding it the simple.dat file as an input. The probably you can read the simple_ddlog/cmd_parser files, which parse the text from the simple.dat file and create ddlog input records.

ryzhyk commented 4 years ago

@d4hines, thanks for the feedback! Do you have an example DDlog program you'd like to run? If so, we can write a Rust client for it and, with your permission, use this example as DDlog API documentation (in lieu of proper documentation that we should write, but have been postponing until we've had time to make the API nicer).

mihaibudiu commented 4 years ago

The Java and C-based APIs are perhaps simpler to understand and better documented. Maybe you can elaborate on your end goal and we can see what the best way to achieve it is.

ryzhyk commented 4 years ago

PS. Also, just curious: does your use case require you to use DDlog from Rust or would you consider using Java, C, or Go API, which are in better shape than the Rust one?

d4hines commented 4 years ago

Thanks for the quick response guys.

Here's my usecase:

I'm working on an implementation of ALM, an action language created by Daniela Inclezan and Michael Gelfond (https://www.depts.ttu.edu/cs/research/documents/12.pdf). ALM is a logic langauge for describing objects, actions, and change. While they originally defined the semantics in terms of ASP, you can also interpret ALM programs as what I'm calling "mutually recursive datalog" programs: one program that establishes the relationships between objects, and another program that describes the relationships between actions and the "next state". By feeding the output of these two programs into each other, you can model a system changing as a result of user input.

All that to say, I'm not interested in any particular DDLog program, but rather in compiling my language down to DDLog. There is a particular program I want to run as a proof of concept, but I still need to translate it by hand from ALM to DDLog (I believe the transformation will be very straight forward, since ALM and Datalog share most syntax in common). I can get that to you if it would be helpful.

I work in web development, so the Rust ecosystem is ideal for us. I don't expect to use any except the most basic features of DDLog - input some facts and watch for changes.

mihaibudiu commented 4 years ago

In this case it's probably easiest if you start with the text input/output. Again, the simple.dl program is a good example. The dat file is a text file with DDlog commands. These commands are described here: https://github.com/vmware/differential-datalog/blob/master/doc/command_reference/command_reference.md

d4hines commented 4 years ago

That's what I'm working with now. It's really sweet for this sort of rapid prototyping! Good work guys!

When I have a small program that I want to test, I'll post it here, and perhaps you guys can walk me through calling it from Rust.

Kixiron commented 4 years ago

I think a really good step towards making a good to use Rust API would be to translate ddlog modules into rust modules as separate files (only slightly different than the current, should be reasonably simple) and to remove a majority of the namespace-correction that ddlog does For example, this ddlog is translated into this rust

// module.dl
typedef Thing = A {} | B {}
pub enum module_Item {
    module_A,
    module_B,
}

This is pretty ugly to use, and I believe that Rust's module system is similar enough to ddlog's to allow 1 to 1 mapping (if anything rust is more permissive, so it could be a negative issue, so to speak) Generating something like this would be far more ergonomic to use from Rust

pub mod module {
    pub enum Item {
        A,
        B,
    }
}

With generating separate files this'll look more like this

// lib.rs
pub mod module;

// module.rs
pub enum Item {
    A,
    B,
}

This'll probably require some more bookkeeping to allow referring to things by their canonical path instead of relying on glob imports, but that's (imo) better in the long run, since using glob imports within generated code is itself risky

The advantage of separating things into physical modules is two-fold: It'll make the ddlog compiler faster since it'll recompile files less often and will do the same for Cargo (a la the std). All in all, the hope is that separate files will just make things faster

ryzhyk commented 4 years ago

@Kixiron, yes, this is the plan. There may have been reasons why we did not go in this direction initially, but I don't think any of them were of a fundamental kind. I completely agree with the usability argument. The compilation speed story is more complicated. The DDlog compiler currently flattens all modules before performing validation and type inference. We'd have to change this to achieve incremental behavior (wouldn't have been a problem if our compiler was written in DDlog ;-). This is not too bad and can be done with a bit of effort.

As for the Rust toolchain, I don't think it relies on module boundaries for incremental compilation. I heard that it uses heuristics that tend to be quite fragile. The only way to enforce compilation units I am aware of is placing things in different crates. Unfortunately, we cannot place each DDlog module in a separate crate, since Rust does not allow circular dependencies between crates. So we'd have to break the program up into strongly connected components that can be compiled into separate Rust crates.

Anyway, this is just a plan and I'm not sure when I'll have time to work on any of this.

workingjubilee commented 4 years ago

For codegen units, yeees? But because rustc is a multi-phase compiler, compilation is not a single event. It mostly feeds LLVM a crate at a time, but when analyzing your source it tries to memoize the graph of dependencies in your code and skip as much as possible before it even moves along the path towards generating LLVM IR.

ryzhyk commented 4 years ago

@workingjubilee , true, but 90% of DDlog's compilation time is spent in LTO, so as far as speeding up compilation goes, breaking up generated code into multiple crates seems like the way to go.

Having said that, I am going to break it up into modules as the first step, and will measure how this affects compilation times before moving on to multiple crates.