vmware / differential-datalog

DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
MIT License
1.37k stars 118 forks source link

Progress indication for ddlog programs #800

Open Kixiron opened 3 years ago

Kixiron commented 3 years ago

Having some sort of status callback or just the ability to see the progress of the running program (whatever that may mean) would be really great for user-facing applications (like RSLint)

ryzhyk commented 3 years ago

Can you give an example?

Kixiron commented 3 years ago

Well, I'm not really sure what the internals of dataflow/timely lend themselves to, but being able to synthesize some sort of percentage finished/percentage remaining/estimated time remaining status is my goal

ryzhyk commented 3 years ago

I guess this could be useful when DDlog is processing a large transaction with many updates (or a small transaction that invalidates previous computation and requires recomputing much of the output from scratch). However I can't think of even a semi-reliable way to track progress, since there is no way of knowing how much downstream work a small change to an internal relation will trigger. It's even worse for recursive computations where there is no way to predict the number of iterations needed to reach a fix point.

In some cases the application may be able to track progress by splitting inputs into chunks and feeding one chunk per transaction. Assuming that chunks generate roughly similar amount of work, progress can be tracked as the ratio of processed chunks to all chunks. You've got to be careful with this approach, as in some applications processing each chunk can be almost as expensive as processing the entire dataset.

Kixiron commented 3 years ago

I'm not really experienced with timely, bout isn't the progress module for this?

ryzhyk commented 3 years ago

Not really. This library is part of timely internals. It notifies timely workers and operators about updates produced by their peers, but this cannot be used to measure progress towards completion.

RDambrosio016 commented 3 years ago

Hey, author of RSLint here, our particular usecase is in the future we would love to attempt to use DDLog for type checking TypeScript. DDLog is fast, but type checking is still a complex task so we would like to measure the progress of the type checking work so we can show it in the cli like how rustc does:

Checking [======> ] lib_name

It does not have to be precise, just a rough estimate, although i presume this is not a simple task since you can't quite predict how much work the input or a change will produce :/

ryzhyk commented 3 years ago

@RDambrosio016, I took this up with @frankmcsherry , the developer of timely and differential, CC'ing you on the issue. If there is a way to do this in timely or DD, we should be able to use it from DDlog.

frankmcsherry commented 3 years ago

Hi folks!

Some context, from which we might pull out a solution: Timely dataflow's progress module does track outstanding work in the system, and so can be used as a measurement of forward progress. However, in a lot of these computations, Datalog included, the amount of forward progress you make may not have much to do with the remaining work before completion.

What the timely dataflow progress information can tell you is "this many records remain unprocessed at operator foo, in iteration bar". Timely dataflow doesn't know if that will result in more records in future iterations, or if the computation is currently winding down. This information could be helpful for reporting some signal of forward progress (vs "is the computation stuck") but I think "x out of 100%" bars would be hard to produce from it.

One possibility, that seems maybe not too hard and at least domain-specific, is in a Datalog system to track the number of new records produced in each round of iteration. This number should be strictly increasing, modulo negation, and you could use this as a measure of forward progress. With a bit of smoothing, you could fake out a progress bar ("number of records at round < now / number of records at round <= now", which would wobble around, but might have the right effect).

Lemme know what you think; there are timely metrics that could be better logged and exposed (I'll drop in on the timely issue next), but I'll probably end up doing the things that I think make the most sense there unless there is some specific input about something else that could help! Cheers,

RDambrosio016 commented 3 years ago

That sounds great, it doesnt have to be precise as long as it shows actual progress towards 100% 👍

ryzhyk commented 3 years ago

Thanks, @frankmcsherry! If I understand the idea correctly, this approach would work inside a recursive fragment of the program. Assuming the first iteration produces 1000 new records, the second iteration produces 100, and the third iteration only 1, we can expect to be nearly done (even though theoretically we can see a million more iterations, each producing a single new record). Is this what you had in mind?

ryzhyk commented 3 years ago

@Kixiron , @RDambrosio016, in the DDlog world you can observe differential timestamps and count the number of updates per timestamp using the Inspect operator (built on top of the timely inspect() method). Once you get around to it, we can experiment with Inspect to see if a semi-useful progress tracking facility can be built for your specific use cases.