Open william-vw opened 3 years ago
I think reasoning with changing data is an issue of its own, without even considering SNAF. On this, you might be interested in the work done around Hylar by @mterdjimi and @lmedini.
@pchampin not really talking about incremental reasoning, although it is certainly relevant to my problem (i.e., adding statements to the model triggering a re-calculation of inferences). Jena already performs (monotonic) incremental reasoning, albeit not with the Hylar optimizations in place (interesting work, btw!)
The builtin log:notIncludes
can be considered a scoped negation as failure, as it is scoped to the subject and object formulas of the statement. As far as I can tell, cited formulas are immutable, since any builtin operating on a formula (e.g., log:conjunction
, or graph:difference
) always results in a new formula. (In line with this, Jena "closes" any cited formulas added to a model, rendering them immutable, and client code can no longer edit them.) Hence, AFAICT, adding new assertions to the N3 document wouldn't interfere with the truthfulness of prior inferences made using log:notIncludes
(as claimed in the original N3 doc), since new assertions cannot alter the original subject and object. Even then, however, an N3 document could load this formula from an online location (using log:semantics
), meaning that the exact same N3 code may yield different inferences, when executed at different times.
In cases such as log:collectAllIn
, the same guarantee is not given, as it pertains to all assertions in the current document (plus online sources, possibly). In case log:collectAllIn
is utilized for NAF (i.e., putting an empty list as the result variable), a new assertion can be added that does meet the where clause, and hence contradicts the prior inference. (Also, the online sources could change at any point, of course.)
This is mainly a problem for APIs that can add new statements at any time, i.e., after a prior reasoning run that involved resolving (S)NAF. An option could be to similarly "close" the base N3 model once it is complete, i.e., after which statements can no longer be added, but I don't feel that is practical in reality.
I thought that introducing some sort of timestamp or signature into the scope argument could go towards sidestepping this issue - it would be populated by the builtin, essentially stating that this inference is guaranteed to be valid only within this (temporal) scope.
I agree that the fact that we are in the Web where data can change at any time is a challenge. Reasoning is always performed on a fixed dataset (some kind of current snap-shot of the world). I would say that the moment you add data to a dataset, this dataset changes and I would even give it another name. This name can indeed be a combination of a file name and a time stamp (to come back to your example).
The problem with time stamps is of course that you don't know whether data has changed in time or not. Is data_11.5.2021.n3 the same as data_12.5.2021.n3 (apologies for the ugly name, imagine some nice time stamp)? I would say that we should only give a new name to our data set the moment it changes. But of course that still remains a tricky task since it is not that easy to know that data did NOT change.
Adding data through log:semantics also is problematic in another way: it is really difficult to fully catch the scope if log:semantics is involved. We could always have sources which call other sources through log:semantics. Here we should haves some kind of closed scope in the sense that all sources involved should somehow be claimed at the beginning of the reasoning run and they should not change during that run.
Totally tangential comment, spurred by sample file names. 11.5.2021
is inherently ambiguous — Nov 5 or May 11? — and will all but inevitably lead to eventual trouble. Please always use ISO-style YYYY-MM-DDZhh:mm:ss
ordering, with whatever (or no) punctuation or timezone detail.
Taking example27 and example28 from https://pr-preview.s3.amazonaws.com/william-vw/N3/pull/81.html#task-workflows the proofs https://github.com/josd/eye/blob/master/reasoning/twf/example27-proof.n3#L117 and https://github.com/josd/eye/blob/master/reasoning/twf/example28-proof.n3#L118 will exactly tell the scope and a proof checker could check whether the content of those uris is not yet expired.
Regarding our discussion today, I think the model / priority part of SCOPE in
log:collectAllIn
,log:forAllIn
may be in line with what is called rule saliency in other systems.But my concern was rather about the validity of inferences when the data changes, at a time unknown to the system. I.e., one can make sure that SNAF-related inferences are based on all relevant statements, by calculating them only after all other (possibly, more salient) rules already contributed their own inferences. But, an API like Jena allows adding statements to the model at any time (unknown to the system): a statement can later be introduced to the same model that renders the prior SNAF-related inference incorrect. I thought that is where SCOPE put the S in SNAF - in this scope of documents, at this point in time, the particular statements were not found and hence the assertion is false.
A similar observation can be made about, e.g.,
log:(not)Includes
when the formula operands are retrieved from an online source - the assertion will only hold for whatever online content was found at that specific point in time. (Or, with that particular signature >:) Hence, perhaps the problem (if any) goes beyond the SCOPE argument.Is this perhaps where proofs would solve the problem? I have to admit I haven't really worked with proofs, but they don't seem to consider this dimension.