karmaresearch / vlog

Apache License 2.0
55 stars 9 forks source link

Incorrect behavior on string literals from csv-import #73

Closed andreschamschurko closed 1 year ago

andreschamschurko commented 3 years ago

Considering the following containments.csv:

"""chili sauce""","""chili pepper"""
"""beer""","""alcohol"""
juice,water

The input KB in Rulewerk syntax is:

@source contains[2]: load-csv("/home/.../containments.csv") .

The reasoning with the rulewerk client version 0.8.0 produces the following trace:

rulewerk> @query contains(?X, ?Y) .
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 9ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper") .                                                            
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, water) .                                                                            
?X -> juice
1 result(s) in 0ms. Results are sound and complete.

The expected behavior can be verified with the previous version 0.7.0:

rulewerk> @query contains(?X, ?Y) .
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 1ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper") . 
?X -> "chili sauce"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                                             
?X -> "beer"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, water) .                                                                                 
?X -> juice
1 result(s) in 0ms. Results are sound and complete.

It seems that the facts containing string literals are not processed correctly.

CerielJacobs commented 3 years ago

Hi,

How do you conclude from this that this is a VLog bug?

On 26 Jun 2021, at 12:40, andreschamschurko @.***> wrote:

Considering the following containments.csv:

"""chili sauce""","""chili pepper""" """beer""","""alcohol""" juice,water

The input KB in Rulewerk syntax is:

@source contains[2]: load-csv("/home/.../containments.csv") .

The reasoning with the rulewerk client version 0.8.0 produces the following trace:

rulewerk> @query contains(?X, ?Y) . ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> juice, ?Y -> water 3 result(s) in 9ms. Results are sound and complete. rulewerk> @query contains(?X, "chili pepper") .
0 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, "alcohol") .
0 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, water) .
?X -> juice 1 result(s) in 0ms. Results are sound and complete.

The expected behavior can be verified with the previous version 0.7.0:

rulewerk> @query contains(?X, ?Y) . ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> juice, ?Y -> water 3 result(s) in 1ms. Results are sound and complete. rulewerk> @query contains(?X, "chili pepper") . ?X -> "chili sauce" 1 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, "alcohol") .
?X -> "beer" 1 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, water) .
?X -> juice 1 result(s) in 0ms. Results are sound and complete.

It seems that the facts containing string literals are not processed correctly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

CerielJacobs commented 3 years ago

Can you please try:

rulewerk> @query contains(?X, “chili pepper"^^http://www.w3.org/2001/XMLSchema#string)

On 28 Jun 2021, at 09:05, Jacobs, C.J.H. @.***> wrote:

Hi,

How do you conclude from this that this is a VLog bug?

On 26 Jun 2021, at 12:40, andreschamschurko @.***> wrote:

Considering the following containments.csv:

"""chili sauce""","""chili pepper""" """beer""","""alcohol""" juice,water

The input KB in Rulewerk syntax is:

@source contains[2]: load-csv("/home/.../containments.csv") .

The reasoning with the rulewerk client version 0.8.0 produces the following trace:

rulewerk> @query contains(?X, ?Y) . ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> juice, ?Y -> water 3 result(s) in 9ms. Results are sound and complete. rulewerk> @query contains(?X, "chili pepper") .
0 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, "alcohol") .
0 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, water) .
?X -> juice 1 result(s) in 0ms. Results are sound and complete.

The expected behavior can be verified with the previous version 0.7.0:

rulewerk> @query contains(?X, ?Y) . ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> juice, ?Y -> water 3 result(s) in 1ms. Results are sound and complete. rulewerk> @query contains(?X, "chili pepper") . ?X -> "chili sauce" 1 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, "alcohol") .
?X -> "beer" 1 result(s) in 0ms. Results are sound and complete. rulewerk> @query contains(?X, water) .
?X -> juice 1 result(s) in 0ms. Results are sound and complete.

It seems that the facts containing string literals are not processed correctly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

andreschamschurko commented 3 years ago

I get 0 results:

rulewerk> @query contains(?X, "chili pepper"^^<http://www.w3.org/2001/XMLSchema#string>)                                                                                                                           
0 result(s) in 2ms. Results are sound and complete.

Another thing i tried, was adding all facts imported from the csv and got this:

rulewerk> @query contains(?X,?Y).                                                                                                                                                                            
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 0ms. Results are sound and complete.
rulewerk> @assert contains("chili sauce", "chili pepper") .
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @assert contains("beer", "alcohol") .
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @assert contains(juice, water) .                                                                                                                                                                     
Asserted 1 fact(s) and 0 rule(s).
rulewerk> @reason .
Loading and materializing inferences ...
... finished in 0ms (0ms CPU time).
rulewerk> @query contains(?X,?Y).                                                                                                   
?X -> juice, ?Y -> water
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
5 result(s) in 0ms. Results are sound and complete.
CerielJacobs commented 3 years ago

Thank you. I suggest you report this issue in rulewerk. Since you use rulewerk, not vlog directly, and your example does not involve any reasoning, I’m not at all convinced that this is a vlog issue.

On 29 Jun 2021, at 14:21, andreschamschurko @.***> wrote:

I get 0 results:

rulewerk> @query contains(?X, "chili pepper"^^http://www.w3.org/2001/XMLSchema#string)
0 result(s) in 2ms. Results are sound and complete.

Another thing i tried, was adding all facts imported from the csv and got this:

rulewerk> @query contains(?X,?Y).
?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> juice, ?Y -> water 3 result(s) in 0ms. Results are sound and complete. rulewerk> @assert contains("chili sauce", "chili pepper") . Asserted 1 fact(s) and 0 rule(s). rulewerk> @assert contains("beer", "alcohol") . Asserted 1 fact(s) and 0 rule(s). rulewerk> @assert contains(juice, water) .
Asserted 1 fact(s) and 0 rule(s). rulewerk> @reason . Loading and materializing inferences ... ... finished in 0ms (0ms CPU time). rulewerk> @query contains(?X,?Y).
?X -> juice, ?Y -> water ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" ?X -> "chili sauce", ?Y -> "chili pepper" ?X -> "beer", ?Y -> "alcohol" 5 result(s) in 0ms. Results are sound and complete.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

mkroetzsch commented 3 years ago

I think this is not a reasoning issue but something related to the API. From what I know, Rulewerk did not change any code related to how strings are represented, but updating the VLog version leads to different behaviour (this needs to be verified; I understand that the problem so far was seen by using different versions of Rulewerk and VLog, so it is not obvious what causes the problem). But we can investigate if there might have been any change in how data was passed from Rulewerk to VLog. If we are passing the same data but get different results, then it can still be that the new behaviour is "correct" and Rulewerk was written for the old "incorrect" behaviour, of course. Let's see what we can find out ...

CerielJacobs commented 3 years ago

@mkroetzsch Could this issue be related to issue #55?

CerielJacobs commented 3 years ago

When reading strings from a .csv file, vlog does not convert them to "..."^^http://www.w3.org/2001/XMLSchema#string.

CerielJacobs commented 3 years ago

@andreschamschurko Could you please try changing the .csv file such that all strings have ^^http://www.w3.org/2001/XMLSchema#string appended and then see what happens?

andreschamschurko commented 3 years ago

With the following csv:

"chili sauce"^^http://www.w3.org/2001/XMLSchema#string,"chili pepper"^^http://www.w3.org/2001/XMLSchema#string
"""beer"""^^http://www.w3.org/2001/XMLSchema#string,"""alcohol"""^^http://www.w3.org/2001/XMLSchema#string
juice,water

I get:

rulewerk> @query contains(?X, ?Y) .
?X -> <chili sauce"^^http://www.w3.org/2001/XMLSchema#string>, ?Y -> <chili pepper"^^http://www.w3.org/2001/XMLSchema#string>
Error: VLog returned a constant name '"beer""^^http://www.w3.org/2001/XMLSchema#string' that Rulewerk cannot make sense of.
rulewerk> @query contains(?X, "chili pepper") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol") .                                                             
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper"^^<http://www.w3.org/2001/XMLSchema#string>)                    
0 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol"^^<http://www.w3.org/2001/XMLSchema#string>)                    
0 result(s) in 0ms. Results are sound and complete.
CerielJacobs commented 3 years ago

I think it works with the following containments.csv:

"""chili sauce""^^http://www.w3.org/2001/XMLSchema#string","""chili pepper""^^http://www.w3.org/2001/XMLSchema#string" """beer""^^http://www.w3.org/2001/XMLSchema#string","""alcohol""^^http://www.w3.org/2001/XMLSchema#string" juice,water

Note that http://www.w3.org/2001/XMLSchema#string should have < and > delimiters.

andreschamschurko commented 3 years ago

I still get the error:

rulewerk> @query contains(?X, ?Y).                                                                       
Error: VLog returned a constant name '"chili sauce"^^http://www.w3.org/2001/XMLSchema#string' that Rulewerk cannot make sense of.
CerielJacobs commented 3 years ago

I think the < > delimiters are missing in your .csv file. I don't know how to disable markdown in these comments, so you don't see them, but they are there in what I wrote. I'll try again:

"""chili sauce""^^<http://www.w3.org/2001/XMLSchema#string>","""chili pepper""^^<http://www.w3.org/2001/XMLSchema#string>" """beer""^^<http://www.w3.org/2001/XMLSchema#string>","""alcohol""^^<http://www.w3.org/2001/XMLSchema#string>" juice,water

andreschamschurko commented 3 years ago

You are right I misunderstood the sentence with the delimiters. Now everything is there:

rulewerk> @query contains(?X,?Y).
?X -> "chili sauce", ?Y -> "chili pepper"
?X -> "beer", ?Y -> "alcohol"
?X -> juice, ?Y -> water
3 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "chili pepper")                 
?X -> "chili sauce"
1 result(s) in 0ms. Results are sound and complete.
rulewerk> @query contains(?X, "alcohol")                 
?X -> "beer"
1 result(s) in 0ms. Results are sound and complete.
CerielJacobs commented 3 years ago

So, I'm pretty sure that the difference between Rulewerk 0.7.0 and 0.8.0 comes from Rulewerk's changed handling of strings, see the discussion in issue #55. I'm closing this issue.

mkroetzsch commented 3 years ago

There is still a problem there. RDF, OWL, and Rulewerk do not distinguish "foo" from "foo"^^<http://www.w3.org/2001/XMLSchema#string>. In VLog, however, it seems that these two forms are distinct. Rulewerk could introduce a special auxiliary datatype for VLog plain strings and then represent VLog's "foo" as something like "foo"^^<http://rulewerk.semanticweb.org/vlog-plain-string>, but is this really a good solution?

CerielJacobs commented 3 years ago

A yes, re-opening.

mkroetzsch commented 3 years ago

It's mainly a design decision, whether VLog wants to use the RDF model for data values or something more general with literals that are not native to RDF. In the latter case, Rulewerk would need to find a clean way to represent any additional kinds of values in its RDF-based type system. This can be done, but I am not sure if it is convenient.

RDF itself used to distinguish "foo" from "foo"^^<http://www.w3.org/2001/XMLSchema#string> in version 1.0. The untyped version only got out of use since 2014, when RDF 1.1 unified the handling of literals. What remains different are the language-tagged strings "foo"@en, which have an own type too but are never written with a type.

CerielJacobs commented 3 years ago

Maybe we should add an RDF mode to Vlog, so that, when running in that mode, "foo" is automatically converted to "foo"^^<http://www.w3.org/2001/XMLSchema#string>.

mkroetzsch commented 3 years ago

This could be a way, but I wonder if there are any cases where the non-RDF mode would be of interest. The two distinct forms of strings only can play a role if one also uses data of the form "foo"^^<http://www.w3.org/2001/XMLSchema#string>, but in this case it seems almost certain that one is using RDF and would want the RDF-mode.

Conversely, even in RDF-mode, it would be ok for users if their "foo"^^<http://www.w3.org/2001/XMLSchema#string> would turn into "foo" internally and in results. So maybe one should just represent "foo"^^<http://www.w3.org/2001/XMLSchema#string> as "foo" in all cases? In non-RDF applications, nothing would change. Of course, one would have to do this simplification (removing xsd:string) in all places where string constants may occur, including in SPARQL results and trident inputs, and I don't know how much work this would be.

CerielJacobs commented 3 years ago

That sounds like a plan. I'm not working today, though, so this will have to wait until next week.

CerielJacobs commented 3 years ago

And this would also solve issue #55.

irina-dragoste commented 3 years ago

So, when this issue will be solved, when loading an RDF file containing constants "foo1" and "foo2"^^<http://www.w3.org/2001/XMLSchema#string>, and then querying for them, are we expecting the resulting karmaresearch.vlog.Term objects to have name field values "foo1" and "foo2" ?

irina-dragoste commented 1 year ago

Fixed, SPARQL results remain to be tested in open issue https://github.com/knowsys/rulewerk/issues/223.