msv-lab / modus

A language for building Docker/OCI container images
GNU Affero General Public License v3.0
280 stars 9 forks source link

Improved Max Depth #64

Open thevirtuoso1973 opened 2 years ago

thevirtuoso1973 commented 2 years ago

There are queries/programs one could construct that should work[^1], but would exceed some fixed maximum depth.

[^1]: Of course, there are also programs which are fully recursive foo :- foo. and would always be infinite.

There may be a way to dynamically determine the maximum depth, given a program.


An ideal value for the maximum depth would be the number of possible facts that can be inferred from a program. This is because paths in an SLD tree correspond to a proof, where every intermediate node proves a (sub)goal that's required to prove the desired goal. Therefore the depth of a path cannot exceed the number of possible facts that can be inferred from a program. Of course in Modus we have ungrounded facts so there are infinite possible facts that can be inferred from programs, so we will consider queries instead.

So I think we should use an expression like predicateCount * numConstants^maxArity to approximate the number of facts that can be inferred from a query.


Alternative approach that I was previously considering is below:

Some kind of bottom-up evaluation which levies the maximum chain of expansion may work, e.g for:

foo.
bar :- foo.

something like: (foo, 1) then (bar, 2).

Although one issue with the bottom-up approach is infinite number of producible facts (due to string_concat), e.g.:

f(X) :- f(Y), string_concat(Y, "1", X).
f("")

And also we would have to address that we have ungrounded facts.

thevirtuoso1973 commented 2 years ago

TODO: solve the problem assuming without string_concat first.

thevirtuoso1973 commented 2 years ago

TODO: check expression for upper bound of max depth from page 111 of logic programming/databases. It may be a good enough estimate.

thevirtuoso1973 commented 2 years ago

So it seems like we could use predicateCount * numConstants^maxArity. Although we should probably modify their expression to account for string_concat, even if we assume stratified strings. In particular, assuming we do not allow lengthening strings, numConstants should include every possible substring of every constant.

thevirtuoso1973 commented 2 years ago

Oh and this should only be computed on queries, not programs, due to ungrounded facts.

mechtaev commented 2 years ago

Why do we assume that we do not allow lengthening strings?

thevirtuoso1973 commented 2 years ago

I thought that was a consequence of our (future) string stratification rules, but I have not actually read it so perhaps not.

Is there any existing literature on stratification semantics for strings? e.g. the stratification restriction defined here http://webdam.inria.fr/Alice/pdfs/Chapter-15.pdf seems to just be an ordering to interpret a Datalog program, so I'm not sure how that relates to string concat.

mechtaev commented 2 years ago

The consequence of stratification is that we can apply string_concat up to a fixed number of times along each execution path, which depends on the structure of the program, but not on the input.

Another thing about this constant from the book is that it is has theoretical purpose, but in practice it is might be too high to actually be able to explore a tree of that height, according to what Pavle told me.

mechtaev commented 2 years ago

The paper that discusses stratified construction is this one:

Sequences, Datalog, and Transducers Anthony Bonnera, Giansalvatore Mecca

thevirtuoso1973 commented 2 years ago

in practice it is might be too high to actually be able to explore a tree of that height

Hmm yes this is quite likely. The expression will likely lead to an intractable number for my OpenJDK Modusfile, for example. There may be a better version, e.g. using each predicate's actual arity instead of maxArity, but I will need to investigate.