potassco / clorm

🗃️ A Python ORM-like interface for the Clingo Answer Set Programming (ASP) reasoner
https://clorm.readthedocs.io
MIT License
52 stars 5 forks source link

Clean up Predicate comparison relations #97

Closed daveraja closed 4 months ago

daveraja commented 2 years ago

The behaviour of the Predicate sub-class comparison relations (==, !=, <, <=, etc) is currently a bit of a mess as it tries to combine two competing intuitions:

  1. behave like normal python tuple-like object comparing the values of the individual fields.
  2. since every fact maps directly to a clingo Symbol object, behave like a Symbol object.

The big reason to not simply treat it like a tuple-like object is that the comparison relations are not always defined. For example, ("a",1) < (1,"a") will fail because of the failed comparison between a str and an int. In contrast, the comparison of two Symbol objects is always well-defined. Being able to compare arbitrary facts is useful. For example, you can sort a list of facts. Even when you don't care what the order is, it is still useful that it exists.

Note, for python tuples, equality and inequality are always well defined. So, ("a",1) == (1,"a").

So, my current thought is:

  1. For the <, <=, >, >= relations to only use the underlying Symbol object.
  2. For the == and != relations.
    • If the Predicate is declared as a tuple then treat it like a tuple. So we only compare the field elements and ignore the Predicate subclass). This means you can also compare a normal python tuple to a clorm tuple.
    • If the Predicate is not a tuple then it will only be equal if it is an instance of the same class with all values (and sign) the same.

@florianfischer91 @MaxOstrowski or anyone else have thoughts?

MaxOstrowski commented 2 years ago

I'm not sure If I do understand this completely. Afaik there is a ==, !=comparison operation for Symbols, right ? So I would also expect this behaviour for Predicates to be the same. So wouldn't it make sense to always have predicates behave like Symbols (in terms of comparison)? Only if comparing a Predicate with something else we would need a different operator. But then is the question, why do you want to compare apples and pears ...

But maybe I'm not seeing something here ?

daveraja commented 2 years ago

@MaxOstrowski @pobermei @sthiele Thanks for the discussion earlier. Also adding @rkaminsk @teiesti @florianfischer91 here in case they have some thoughts on the issue.

I think the (weak) consensus so far is that the comparison operators should be overload to compare the underlying clingo Symbol objects only for all comparison operators (including == and !=). Here are some boundary-case examples to continue the discussion.

import datetime
from clorm import Predicate, IntegerField, SimpleField, StringField, FactBase

class StrangeIntField(IntegerField):
    pytocl = lambda x: 10-x
    cltopy = lambda x: x+10

class P(Predicate):
    a = IntegerField

class PA(Predicate):
    a = StrangeIntField
    class Meta:
        name = "p"

p2 = P(a=2)
pa8 = PA(a=8)
print(f"{p2.a} != {pa8.a}")
assert p2.a != pa8.a

Here p2 is an instance of class P and pa8 is an instance of class PA and they have different values for their respective a fields. (p2.a == 2 and pa8.a == 8)

But, p2 and pa8map to the sameSymbolobjectp(2)`. So following the current consensus, despite the difference in the fields of the two objects, they should be considered equivalent:

print(f"{p2} == {pa8}")
assert p2 == pa8

This feels a bit strange to me that you can have both p2.a != pa8.a and p2 == pa8.

The less than can also be a bit strange:

pa9 = PA(a=9)
assert pa9.a > p2.a and pa9 < p9 

Note: the above StrangeIntField isn't very meaningful but in general the "field" classes define a translation from python objects to Symbol objects (and vice-versa), so I think it is important to allow users to define arbitrary translations. For example: translating from python date objects to clingo strings:

class DateField(StringField):
    pytocl = lambda dt: dt.strftime("%Y-%m-%d")
    cltopy = lambda s: datetime.datetime.strptime(s,"%Y-%m-%d").date()

Anyway, in the above we've had examples of where using the Symbol for the comparison may not make sense, but there are cases where it is really useful. For example in a query:

class Q(Predicate):
    a = (SimpleField, IntegerField)

q1 = Q((1,2))
q2 = Q(("a",2))

fb = FactBase([q1,q2])

query1 = fb.query(Q).where(Q.a < (2,2))
result1 = list(query1.all())

query2 = fb.query(Q).order_by(Q.a)
result2 = list(query2.all())

If we implement a normal python tuple-like semantics then executing both query1 and query2 will fail because query1 will fail when it tries to compare ("a",2) < (2,2) and query2 will fail when it tries to compare (1,2) < ("a",2) as part of the sorting. Whereas if we use the underlying Symbol comparisons both queries are well-defined.

florianfischer91 commented 2 years ago

I can't really judge what a good design decision would be but as has been said, using the same comparison behavior for Predicate and Symol would be the most consistent solution. You just have to define eq and one of the other comparsion-functions, use the total_ordering decorator and you are done.

What feels a bit strange is that when you compare arguments individually it depends on the argument which comparison 'mechanism' is used right? F. i. if you have simple types like int, str etc. the builtin-comparison is used, but if the argument is a Predicate-Instance then the symbol-comparsion is used...

rkaminsk commented 2 years ago

As far as I understand, predicates only capture ground atoms. I think that implementing the comparison operators delegating to the underlying symbol is a good idea.

daveraja commented 2 years ago

I can't really judge what a good design decision would be but as has been said, using the same comparison behavior for Predicate and Symol would be the most consistent solution. You just have to define eq and one of the other comparsion-functions, use the total_ordering decorator and you are done.

That sounds like a +1 for using Symbol behaviour. Looks like I'm the only one with reservations...

BTW. I think it's probably best to implement the operators individually, as the docs for total_ordering have a warning:

While this decorator makes it easy to create well behaved totally ordered types, it does come at the cost of slower execution and more complex stack traces for the derived comparison methods. If performance benchmarking indicates this is a bottleneck for a given application, implementing all six rich comparison methods instead is likely to provide an easy speed boost.

What feels a bit strange is that when you compare arguments individually it depends on the argument which comparison 'mechanism' is used right? F. i. if you have simple types like int, str etc. the builtin-comparison is used, but if the argument is a Predicate-Instance then the symbol-comparsion is used...

Yes. if you compare the individual argument then that argument's builtin comparison is used. So if you wanted to you could create some odd behaviour like: x.a != y.a but x == y.

daveraja commented 2 years ago

As far as I understand, predicates only capture ground atoms. I think that implementing the comparison operators delegating to the underlying symbol is a good idea.

Yes. Predicate instances only capture ground atoms.

Ok. Looks like using the underlying symbol is the popular choice. I'll do that. I guess the odd behaviour that I highlighted earlier is really for boundary cases. If users define their predicates sensibly then it will behave sensibly.

daveraja commented 2 years ago

Hmm. I'm feeling more nervous about this change.

Changing the behaviour of == (and !=) has more serious implications than other comparison operators (<, etc.). The implementation of FactBase will need to change to make it behave consistently with a normal set object.

With the new semantics we will currently have the following (using the new syntax for defining predicates):

class P1(Predicate, name="p"):
   a1: int

class P2(Predicate, name="p"):
   a2: int

s = set([P1(a1=1), P2(a2=1)])
fb = FactBase([P1(a1=1), P2(a2=2)])

assert len(s) == 1
assert len(fb) == 2  # WRONG!

The problem is that FactBase internally stores the facts in a separate set for each predicate class. This has been important for querying. Will have to think about what's the best way to change this. Presumably it is better to not lose query performance, so probably the simplest is to maintain an extra internal set that contains all facts as well as the predicate class specific sets. So it becomes less space efficient.

Leaving this issue open until I make all the changes and add to the docs.

florianfischer91 commented 2 years ago

What would be the reason for a user to define two different classes (predicates) which lead to the exact same symbol-representation? My first thought was that this would be more like an modelling error on the python side (and if it is an error perhaps it's better to not allow 'redundant' classes?)

rkaminsk commented 2 years ago

Hmm. I'm feeling more nervous about this change.

Changing the behaviour of == (and !=) has more serious implications than other comparison operators (<, etc.). The implementation of FactBase will need to change to make it behave consistently with a normal set object.

With the new semantics we will currently have the following (using the new syntax for defining predicates):

class P1(Predicate, name="p"):
   a1: int

class P2(Predicate, name="p"):
   a2: int

s = set([P1(a1=1), P2(a2=1)])
fb = FactBase([P1(a1=1), P2(a2=2)])

assert len(s) == 1
assert len(fb) == 2  # WRONG!

The problem is that FactBase internally stores the facts in a separate set for each predicate class. This has been important for querying. Will have to think about what's the best way to change this. Presumably it is better to not lose query performance, so probably the simplest is to maintain an extra internal set that contains all facts as well as the predicate class specific sets. So it becomes less space efficient.

Leaving this issue open until I make all the changes and add to the docs.

You could still group by predicate name; "p" in your example. The instance P2(a2=1) could be discarded or overwrite the previous one during construction. As @florianfischer91 mentions, this looks like a corner case, so clorm just has to do something reasonable enough.

daveraja commented 2 years ago

You could still group by predicate name; "p" in your example. The instance P2(a2=1) could be discarded or overwrite the previous one during construction. As @florianfischer91 mentions, this looks like a corner case, so clorm just has to do something reasonable enough.

Sort of. I will need to group by the python class name, not the asp predicate "p". For consistency with set I think discarding rather than overwriting will be the correct behaviour (but I'll double check this). In any case it's all do-able, just a matter of deciding what's the best compromise in terms of overhead.

daveraja commented 2 years ago

What would be the reason for a user to define two different classes (predicates) which lead to the exact same symbol-representation? My first thought was that this would be more like an modelling error on the python side (and if it is an error perhaps it's better to not allow 'redundant' classes?)

Yes, I can't think of good scenarios where this is not a modelling mistake. The only scenario I can come up with is something like modelling a dynamic system where you have fluents. Say you have a holds/2 predicate where the first parameter is the fluent and the second is the time step. So you could have facts like:

holds(posn(robot1, kitchen),0).
holds(holding(robot1, plate), 1).

Now maybe you are interested in specifying the initial robot position to pass to the solver but then once your run the solver you want to know what fluents were set but you don't care about accessing the internal details of each fluent. So you could define the following Predicate subclasses.

class Posn(Predicate, name="posn"):
     robot = ConstantField
     location = ConstantField

class HoldsPosn(Predicate, name = "holds"):
     posn = Posn.Field
     time = IntegerField

class Holds(Predicate, name = "holds"):
      fluent = RawField
      time = IntegerField

So here the symbol produced by any HoldsPosn instance could also be produced by a Holds instance.

This isn't the greatest example but not completely implausible.