populse / populse_db

Database API for Populse
Other
1 stars 4 forks source link

Document queries syntax #44

Open denisri opened 3 years ago

denisri commented 3 years ago

What's the query syntax in DatabaseSession.filter_documents() ? It's not SQL, it's an unknown language... The docs only say:

        filter_query: Filter query (str)

                                - A filter row must be written this way: {<field>} <operator> "<value>"
                                - The operator must be in ('==', '!=', '<=', '>=', '<', '>', 'IN', 'ILIKE', 'LIKE')
                                - The filter rows can be linked with ' AND ' or ' OR '
                                - Example: "((({BandWidth} == "50000")) AND (({FileName} LIKE "%G1%")))"

I don't understand why there are 2 parentheses around "lines", and I can't make a working query on my own. I just want to perform a simple query like "select documents in a list of IDs". I tried thinks like:

{FileName} in ("/path1", "/path2")

or:

{FileName} in (("/path1", "/path2"))

but always end up with a syntax error that I cannot interpret, like this one:

Unexpected token Token(LPAR, '(') at line 1, column 15.
Expected one of: 
        * LSQB
        * TIME
        * FIELD_NAME
        * KEYWORD_LITERAL
        * DATE
        * ESCAPED_STRING
        * SIGNED_NUMBER
        * QUOTED_FIELD_NAME
        * DATETIME

Sure, I know nothing about Lark, not even if it's a language, a parser library, or anything else, but I guess a standard user doesn't want to know...

We really need a documentation for the query language...

Moreover it's not easy in the Populse_db API to get a collection primary key to perform queries on: currently we need to get to the underlying SQLite engine to get it (or maybe I missed something ?)

For now I could only make my request by totally bypassing the DatabaseSession layer and getting directly to the SQLite engine, but of course this is not what it's meant for...

sapetnioc commented 3 years ago

Yes, there many things to improve in populse_db documentation. The query language has nothing to do with Lark. This is just the library to parse the grammar. I defined this grammar. What you try to do is an IN operator between a field and a list literal. I think this is not implemented. You can use FileName == "/path1" OR FileName == "/path2" for now.

I do not understand what cannot be done on collection primary key. A primary key is a field as any other and therefore can be used on queries. There is no join operation right now if this is what you are looking for.

We must never use directly the SQlite engine because this would make code incompatible with another engine (Postgres will be back one day). If ther is a need, we must enhance pobulse_db API.

sapetnioc commented 3 years ago

For primary key I may have understand. Are you looking for dbs.collection(collection).primary_key ? This is available in populse_db 2.

denisri commented 3 years ago

For primary key I may have understand. Are you looking for dbs.collection(collection).primary_key ? This is available in populse_db 2.

That's probably what I was looking for, thanks ! ;)

We must never use directly the SQlite engine because this would make code incompatible with another engine (Postgres will be back one day). If ther is a need, we must enhance pobulse_db API.

I know and that was precisely why I was asking...

So the IN syntax doesn't exist ? That would explain why I couldn't make it work... (it is actually in the operators listed in the doc, however). Using FileName == "/path1" OR FileName == "/path2" will make very long requests (maybe hitting a size limit) when I have a long list of elements to retreive. Speaking of which, python.sqlite3 has a syntax to pass "where_data", but Populse_db has not the equivalent, all must be in a single string. Will it be a problem for large requests ?

sapetnioc commented 3 years ago

Well, the IN operator exists but is only implemented for searching an element (given as a field name or a literal) in a list field (i.e. a field name). We can extend the query syntax if necessary.

I am not sur it is very efficient to perform large queries. We may also think to improve the query system to put data out of the query string.

denisri commented 3 years ago

OK I totally misunderstood the meaning of the IN operator ;) So for now we have no efficient way of performing such a request...