stac-utils / stac-fastapi

STAC API implementation with FastAPI.
https://stac-utils.github.io/stac-fastapi/
MIT License
245 stars 99 forks source link

Use of NOT in advanced comparison operators throws error #473

Open philvarner opened 2 years ago

philvarner commented 2 years ago

The NOT operator can be used with all of the advanced comparison operators (https://github.com/opengeospatial/ogcapi-features/blob/master/cql2/standard/annex_ats_advanced-comparison-operators.adoc) like "NOT BETWEEN", "NOT LIKE", or "NOT IN"

Queries like eo:cloud_cover NOT BETWEEN 0 AND 50 and collection NOT LIKE 'sent%' throw an exception like Unexpected token Token('NOT', 'NOT') at line 1, column 16.

gadomski commented 1 year ago

Confirmed that this is still an issue with the following test:

async def test_cql2text_not(app_client, load_test_collection, load_test_item):
    params = {
        "filter": "collection NOT LIKE 'not-a-collection-id'",
        "filter-lang": "cql2-text",
    }
    response = await app_client.get("/search", params=params)
    assert response.status_code == 200, response.json()

which produced the following error:

FAILED stac_fastapi/pgstac/tests/resources/test_item.py::test_cql2text_not[api_client0] - lark.exceptions.UnexpectedToken: Unexpected token Token('NOT', 'NOT') at line 1, column 12.

Looking at https://github.com/geopython/pygeofilter/tree/v0.2.0/pygeofilter/parsers/cql2_text, it seems non-trivial to add support -- for one, there's no tests for cql2_text in the main branch, so nothing to build on to try to add support.

I've opened https://github.com/geopython/pygeofilter/issues/68 to track this specific problem, and I'm removing this issue from our milestones, as pygeofilter feels a little abandoned ATM (e.g. I can't get the test suite passing locally or in the docker container, following the README).

eseglem commented 1 year ago

I took a look at pygeofilter and was able to get some of the tests to work. The parser ones work as long as its lark<1 and there are a few tests for cql2_text in there but its just comparing the text against the json parser and not likely to be helpful because of that. A lot of the other tests seem mostly related to being built with older versions of stuff and not having anything pinned in the requirements. Like the calling functions that don't exist in sqlite.

I was not familiar with lark before looking at it over there but have done a bunch of similar parsing with antlr4. Sadly I am pretty sure there are multiple issues with the grammar not matching up to the spec.

I did find https://github.com/opengeospatial/ogcapi-features/blob/master/cql2/standard/schema/cql2.bnf but sadly there aren't any any automated tools to convert it to lark or antlr4. I played around with doing some of it by hand, but its not going to be fun in antlr4 at least. I'll probably take another look at some of the lark and see if I can't figure out more of it in the near future, but it does seem like it may be a heavy lift to get completely working.

gadomski commented 1 year ago

There was a new release of pygeofilter (v0.2.1) so we should check to see if this issue was fixed.

eseglem commented 1 year ago

It still needs more cleaning up, and a larger test suite. But I have been working on https://github.com/eseglem/pycql2 because it seemed like an interesting problem. It has pydantic models for cql2-json and a new parser for cql2-text. It is able to handle all the examples (that aren't out of date) in the cql2 spec. And handles all of the NOT cases as well. As far as I can tell all that pygeofilter is being used for is to translate the text to json, so I think it could probably be usable as a replacement.

Assuming this is useful, I am not sure where the best permanent home for this is. I wanted to keep things standalone to start, and not have to worry about any compatibility, and I was thinking that pygeofilter was kind of dead until that release came out. And while the pydantic model could probably end up being used as an AST, I didn't worry about any of that yet. This just translates back and forth between json and text versions. So it has none of the other dependencies and such that were causing some pain when I took a look at it.

gadomski commented 1 year ago

Very cool, thanks for sharing.

Assuming this is useful, I am not sure where the best permanent home for this is.

In a perfect world, a translator for CQL2 text <-> json would be useful as a standalone repo and as a component of pygeofilter. I'm not sure how much of a lift it would be to use pycql2 in pygeofilter -- have you gotten much of a sense for the effort required there in your research?

In terms of a permanent home, IMO for now it's probably best to keep it in your personal account. If we are able to successfully use pycql2 in stac-fastapi, that would help indicate a level of maturity and adoption that would make it reasonable consider moving it to some more formal organization -- not sure what that would be ATM.

eseglem commented 1 year ago

As far as pygeofilter goes. It probably can be integrated there. It certainly doesn't seem impossible but I also don't know how easy it would be. It might not be too bad to parse the text to the pydantic model, and then have a walk_cql_json function for the model. The complication is probably more to do with the test situation, and potentially updating all of the backends if things change.

Standalone repo definitely has benefits as to not having to deal with those backends but I also would like to see it integrated wherever possible. So will have to see how things go.

I got the coverage up and should have at least one test for all of the keywords / operations. Not every possible combination of them, but at least touching all the different parts of each parser. I want to add in hypothesis as well at some point, but wanted to put it out there and start getting feedback. I will probably work on getting GitHub actions set up and maybe a release out next.

eseglem commented 1 year ago

Published v0.1.0 to PyPi: https://pypi.org/project/pycql2/ so it can be tested against easier. Not totally happy with how I named things and input is definitely welcome on any of it.

Lots of notes in the readme. A few things aren't completely true to the spec, but should be true to the spirit of it at least. Mostly related to the WKT definition.

Used hypothesis and hypo fuzz to test text -> json. So I am fairly confident anything that pretty much any good cql2-text query will become valid cql2-json. Haven't gone the other way yet, but I do plan to.