mogui / pyorient

Orientdb driver for python that uses the binary protocol.
Apache License 2.0
167 stars 127 forks source link

Contains on embedded data types may be broken, and it's unclear how it's supposed to work #140

Open obi1kenobi opened 8 years ago

obi1kenobi commented 8 years ago

There don't seem to be any tests that use the .contains method on an embedded data property (EmbeddedSet etc.) and I can't figure out how to use it without getting weird errors. Consider the following code:

BaseNode = declarative_node()

class Person(BaseNode):
    element_type = 'Person'
    element_plural = 'Person'

    name = String(nullable=False)
    aliases = EmbeddedSet(nullable=False)

class ModelsTest(unittest.TestCase):
    def setUp(self):
        self.graph = get_test_orientdb_graph()
        self.graph.create_all(BaseNode.registry)

    def test_get_person_by_name(self):
        person1 = self.graph.Person.create(name='Hello World',
                                           aliases=set(['Hello T. World', 'World']))
        name = 'Hello World'
        result = self.graph.query(Person).filter(
            (Person.name == name) | Person.aliases.contains(name)).all()
        self.assertSetEqual(set(result), set([person1]))

This fails with the following trace:

.../pyorient/ogm/query.py:145: in prepare
    wheres = rid_clause + self.build_wheres(params)
.../pyorient/ogm/query.py:411: in build_wheres
    exp_where = [self.filter_string(filter_exp)] if filter_exp else []
.../pyorient/ogm/query.py:339: in filter_string
    , self.filter_string(right))
.../pyorient/ogm/query.py:320: in filter_string
    left_str, self.filter_string(right))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <pyorient.ogm.query.Query object>, expression_root = 'Hello World'

    def filter_string(self, expression_root):
>       op = expression_root.operator
E       AttributeError: 'str' object has no attribute 'operator'

Specifically, this line seems to be the problem, and I don't understand why it recursively calls filter_string on the right-hand side. Can the RHS be an expression? I would have expected it to always be a variable, because I don't see how one of these data structures could be tested for containing the result of an expression.

TropicalPenguin commented 8 years ago

It was written following: http://orientdb.com/docs/2.0/orientdb.wiki/SQL-Where.html

... where 'contains' takes a conditional expression.

I've never used - and therefore don't really understand - the semantics of embedded types in OrientDB (nor, for that matter, linked class types), so I'm not sure how much I'll be able to help, except to naively guess that maybe 'contains' should only be callable on LinkedClassProperty types and to allow a different 'in' syntax for embedded data?

TropicalPenguin commented 8 years ago

Looks like it's impossible to overload python's 'in' keyword: __contains__ return value is always coerced to bool, and overloading __bool__ to return a non-bool will trigger a TypeError :(

obi1kenobi commented 8 years ago

From the description of CONTAINS in the same document you cited: "Condition can be a single item: in this case the behaviour is like the IN operator"

This definitely means the pyorient behavior is a bug. I also don't see an IN operator in pyorient's operators file.

Any thoughts on how best to proceed?

TropicalPenguin commented 8 years ago

Hm, for some reason I didn't notice that bit.

This latest commit (https://github.com/OuterPlaypen/pyorient/commit/69475faab91c0bb747d42e64771629b3ae8f3191) is my proposal: since it behaves like IN, and IN looks like it was featured in earlier versions than CONTAINS, we might as well use it - and as a happy coincidence this gets around needing to define some ugly reverse-polish style in_(<value>,<collection>) notation.

obi1kenobi commented 8 years ago

I don't have strong opinions on how this is implemented under the hood, so long as it works -- and it would be nice to have some test coverage to make sure it actually stays working in the future.

That said, it looks a little strange that a 'contains' is being rewritten with 'in' on the wire, when 'contains' was added before the 1.0 release. I don't think pyorient supports any version that early, and the code may be simpler if both paths use 'contains'.

obi1kenobi commented 8 years ago

Scratch that, please use in instead of the supposedly equivalent contains -- just checked on OrientDB 2.1.3 and they do not behave identically. in returns the correct result while contains returns no elements, when querying a FULLTEXT indexed field, even when it's the only predicate clause.

obi1kenobi commented 8 years ago

Another update: contains(X) does not work when X is a primitive, but contains X does.

Ostico commented 8 years ago

Can i close?

obi1kenobi commented 8 years ago

I think this is still at least a documentation issue -- it's unclear to me whether the contains method in pyorient is allowed to take a collection type or not, and whether that will be correctly handled by the code. So I'd say it's still an open issue, but it's of course your decision in the end.