vmware / differential-datalog

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
MIT License
1.37k stars 117 forks source link

maximum 2**16 chained updates #761

Open eqv opened 4 years ago

eqv commented 4 years ago

This issue might be related to #224, but I don't see any crashes or unexpected low performance. Instead the program silently fails to find the expected solutions. It appears that only 2**16 updates can be performed consecutively.

relation Rb(x: u32)
output relation Rque(x: u32)

Rb(0).
Rb(x+1) :- Rb(x), x < 65599.
Rque(x) :- Rb(x), x > 65530.

Only returns the following values:

Rque{.x = 65531}: +1
Rque{.x = 65532}: +1
Rque{.x = 65533}: +1
Rque{.x = 65534}: +1

However:

relation Rb(x: u32)
output relation Rque(x: u32)

Rb(0).
Rb(x|1) :- Rb(x).
Rb(x|2) :- Rb(x).
Rb(x|4) :- Rb(x).
Rb(x|8) :- Rb(x).
Rb(x|16) :- Rb(x).
Rb(x|32) :- Rb(x).
Rb(x|64) :- Rb(x).
Rb(x|128) :- Rb(x).
Rb(x|256) :- Rb(x).
Rb(x|512) :- Rb(x).
Rb(x|1024) :- Rb(x).
Rb(x|2048) :- Rb(x).
Rb(x|4096) :- Rb(x).
Rb(x|8192) :- Rb(x).
Rb(x|16384) :- Rb(x).
Rb(x|32768) :- Rb(x).
Rb(x|65536) :- Rb(x).
Rb(x|131072) :- Rb(x).
Rb(x|262144) :- Rb(x).
Rb(x|524288) :- Rb(x).
// will print 1048577 lines
Rque(x) :- Rb(x).

seems to work as expected.

ryzhyk commented 4 years ago

It appears that only 2**16 updates can be performed consecutively.

This is exactly the case. DDLog currently uses a 16-bit counter for iteration number, which means that it won't correctly perform computations that require more iterations. This saves some memory by reducing the size of metadata associated with each record inside DDlog and does not normally cause problems, as most practical use cases don't require more than 2^16 iterations (at least this has been the case so far).

I agree, however, that outputting wrong result quietly is not the right behavior. It should rather fail explicitly. We should also make this parameter configurable, so that one can choose to use a 32-bit counter at compile time.

Are you just experimenting or do you have a real use case that requires > 2^16 iterations?

eqv commented 4 years ago

To be honest, I don't know if I will run into this issue, I was experimenting with this to get a feeling for the performance. As the second example seems to work (though I don't really understand why), I might be able to get around it. I'm planning to build a disassembler similar to https://arxiv.org/abs/1906.03969. There will be very long chains of what boils down to:

relation Entry(addr :u64)
relation Disassembly(addr: u64, SomeInfo)

Disassembly(a, info) :- Entry(a), info=diassemble(a). 
Entry(a+info.size):- Disassembly(a,info), info.not_a_jump. 
// some more rules

And there will definitely be cases where we will be disassembling > 2**16 instructions in a row.

mihaibudiu commented 4 years ago

The second example doubles the set size in each iteration, so it needs only a logarithmic number of iterations in the size of the output set. Why do you need an incremental disassembler? I would expect that #225 is more relevant, in fact, the program not crashing is very worrisome, as Leonid said. This is a bug. DDlog is optimized for running continuously for a long time rather than computing a long time one result. Each invocation is expected to finish within a relatively small number of iterations.

mihaibudiu commented 4 years ago

Maybe we can have a different build optimized for very long running computations, where the iteration counter could be 64 bits. So you essentially have two different ddlog builds, one for continuous operations, and one for more traditional datalog-like programs.

ryzhyk commented 4 years ago

To be honest, I don't know if I will run into this issue, I was experimenting with this to get a feeling for the performance. As the second example seems to work (though I don't really understand why), I might be able to get around it. I'm planning to build a disassembler similar to https://arxiv.org/abs/1906.03969. There will be very long chains of what boils down to:

relation Entry(addr :u64)
relation Disassembly(addr: u64, SomeInfo)

Disassembly(a, info) :- Entry(a), info=diassemble(a). 
Entry(a+info.size):- Disassembly(a,info), info.not_a_jump. 
// some more rules

And there will definitely be cases where we will be disassembling > 2**16 instructions in a row.

Interesting use case. So we will indeed need to implement configurable timestamps to support this, but I also don't expect this approach to work efficiently or to have good incremental performance (if this is something you care about). It won't be efficient, as DDlog will end up performing many iterations, processing a single instruction on each iteration. The differential dataflow library that DDlog is built on is optimized for batch processing and does not perform very well when doing many tiny iterations. It won't be incremental as a single instruction change in the middle of the program will shift all addresses around and trigger recomputation of all other instructions. Does your use case require incremental performance? If so, we can brainstorm ways to restructure the problem to achieve this.

ryzhyk commented 4 years ago

I would expect that #225 is more relevant, in fact, the program not crashing is very worrisome, as Leonid said. This is a bug.

I think I know what's going on. I used to have my own implementation of the the 16-bit TS counter, which panic'ed on overflow. I recently replaced it with DD's implementation, which was added in the latest DD release. Apparently it does not panic. I'll explore a bit more and submit an issue against DD.

ryzhyk commented 4 years ago

Maybe we can have a different build optimized for very long running computations, where the iteration counter could be 64 bits. So you essentially have two different ddlog builds, one for continuous operations, and one for more traditional datalog-like programs.

We can add a switch to ddlog to choose the nested TS size.

eqv commented 4 years ago

There are two reasons why I'd prefer to use DDlog over souffle or crepe: First (and probably most importantly) the ability to use rust structs as ground terms, and to neatly integrate with rust. Secondly, because I will be using the disassembly (and the analysis building on top of it) side by side with fuzzing. As the fuzzer uncovers new code, new basic blocks are added as they are found by the dynamic analysis. Existing instructions won't be moved or replaced, so performance will probably not be much of an issue. The Incremental aspect is mostly about not recomputing stuff we already know.
Changing the size to u64 or even u32 (4 Billion entries will probably be problematic in terms of memory anyway) would most certainly suffice for my usecase.

ryzhyk commented 4 years ago

There are two reasons why I'd prefer to use DDlog over souffle or crepe: First (and probably most importantly) the ability to use rust structs as ground terms, and to neatly integrate with rust.

Cool, this part will get better in a couple of days. But I also want to mention that DDlog has a pretty powerful native type system and expression language. We are able to do a lot of complex data processing directly in DDlog.

As the fuzzer uncovers new code, new basic blocks are added as they are found by the dynamic analysis. Existing instructions won't be moved or replaced, so performance will probably not be much of an issue.

Ok, this makes sense. The computation will indeed be incremental. Still, processing data one instruction at a time is going to be inefficient in DDlog, probably orders of magnitude slower than in C, and possibly even Souffle (since in Souffle iteration is literally just a loop). You would be able to get back a lot of the performance by supplying multiple inputs, e.g., program fragments, that can be processed independently.

Changing the size to u64 or even u32 (4 Billion entries will probably be problematic in terms of memory anyway) would most certainly suffice for my usecase.

Ok, this should be a relatively easy change. I'll try to do it in the next few days.

eqv commented 4 years ago

Yeah actually when I said rust types as ground terms, I meant the ddlog native types that map 1 to 1 to rust types that make it easy to work with in native functions. I'm a huge fan! The only reason why I use native rust so far is because the transformer functions can not be recursive (e.g. substitute in an tree shaped expression doesn't work natively). It's good to know that there will be significant changes - I was just about to release a pretty long blog post about program analysis with ddlog^^ I guess I'll wait a few days than.

I don't think It'll happen often that we only have a single very long entry point to disassemble, in most cases there is going to be a significant amount of branching. In fact I'm pretty sure for most usecases 64k will probably work OK, due to e.g. call instructions branching into the call target and the fall through). But it's awfully close when disassembling large binaries with > 50MB of code. Generally, for now I'm not too concerned about the disassembly performance. I can always do the disassembly on rust side and manually insert it as facts, if need be. The following analysis passes will probably take significantly more time anyway.

I'm looking forward to it, thank you so much. For now it's not a blocking issue - I can just use a small example for now.

ryzhyk commented 4 years ago

I use native rust so far is because the transformer functions can not be recursive (e.g. substitute in an tree shaped expression doesn't work natively).

Do you have an example? Both recursive types and functions are supported, as long as you wrap recursive fields in Ref<>, Vec<>, or Intern<>. But I can also imagine situations where this is not a good enough solution.

I don't think It'll happen often that we only have a single very long entry point to disassemble, in most cases there is going to be a significant amount of branching.

Ah, that might help somewhat.

The following analysis passes will probably take significantly more time anyway.

Ok, makes sense. I'm very curious to see how this works out.

eqv commented 4 years ago

Wth. I tried to use recursion earlier, and I couldn't get it to work. https://github.com/vmware/differential-datalog/blob/master/doc/language_reference/language_reference.md also explicitly forbids recursion. But now I'm unable to build a minimal example where It doesn't work - List Length works as expected ^^ I'll try to re-implement the expression substitution directly with ddlog functions and if I run into any trouble, I'll let you know. Also, thank you very much for the support!

eqv commented 4 years ago

Whatever was causing issues with recursive functions seems to have been an instance of PEBKAC... But there will be some need for external function calls to simplify formulas and to SMT solvers anyway.

ryzhyk commented 4 years ago

SMT solver bindings for DDlog would be neat! And yes, Rust is more powerful and more performant than DDlog when it comes to manipulating complex data structures, so the approach makes sense.