Open thevirtuoso1973 opened 2 years ago
TODO: solve the problem assuming without string_concat
first.
TODO: check expression for upper bound of max depth from page 111 of logic programming/databases. It may be a good enough estimate.
So it seems like we could use predicateCount * numConstants^maxArity
.
Although we should probably modify their expression to account for string_concat
, even if we assume stratified strings. In particular, assuming we do not allow lengthening strings, numConstants
should include every possible substring of every constant.
Oh and this should only be computed on queries, not programs, due to ungrounded facts.
Why do we assume that we do not allow lengthening strings?
I thought that was a consequence of our (future) string stratification rules, but I have not actually read it so perhaps not.
Is there any existing literature on stratification semantics for strings? e.g. the stratification restriction defined here http://webdam.inria.fr/Alice/pdfs/Chapter-15.pdf seems to just be an ordering to interpret a Datalog program, so I'm not sure how that relates to string concat.
The consequence of stratification is that we can apply string_concat up to a fixed number of times along each execution path, which depends on the structure of the program, but not on the input.
Another thing about this constant from the book is that it is has theoretical purpose, but in practice it is might be too high to actually be able to explore a tree of that height, according to what Pavle told me.
The paper that discusses stratified construction is this one:
Sequences, Datalog, and Transducers Anthony Bonnera, Giansalvatore Mecca
in practice it is might be too high to actually be able to explore a tree of that height
Hmm yes this is quite likely. The expression will likely lead to an intractable number for my OpenJDK Modusfile, for example.
There may be a better version, e.g. using each predicate's actual arity instead of maxArity
, but I will need to investigate.
There are queries/programs one could construct that should work[^1], but would exceed some fixed maximum depth.
[^1]: Of course, there are also programs which are fully recursive
foo :- foo.
and would always be infinite.There may be a way to dynamically determine the maximum depth, given a program.
An ideal value for the maximum depth would be the number of possible facts that can be inferred from a program. This is because paths in an SLD tree correspond to a proof, where every intermediate node proves a (sub)goal that's required to prove the desired goal. Therefore the depth of a path cannot exceed the number of possible facts that can be inferred from a program. Of course in Modus we have ungrounded facts so there are infinite possible facts that can be inferred from programs, so we will consider queries instead.
So I think we should use an expression like
predicateCount * numConstants^maxArity
to approximate the number of facts that can be inferred from a query.Alternative approach that I was previously considering is below:
Some kind of bottom-up evaluation which levies the maximum chain of expansion may work, e.g for:
something like:
(foo, 1)
then(bar, 2)
.Although one issue with the bottom-up approach is infinite number of producible facts (due to
string_concat
), e.g.:And also we would have to address that we have ungrounded facts.