koopjs / koop-pgcache

PostGIS cache for Koop.
Other
4 stars 4 forks source link

better sql parsing #54

Closed dmfenton closed 8 years ago

dmfenton commented 8 years ago

Changes SQL parsing to be less complicated and more accurate. Adds tests.

jakesower commented 8 years ago

Looks like the tests failed. BTW, why not use a SQL parsing lib like https://github.com/forward/sql-parser ?

dmfenton commented 8 years ago

Yep, that's why I unassigned it for the moment. I'm adding more tests and working through some bugs.

Ideally I'd like to modify and use something like https://github.com/godmodelabs/flora-sql-parser, but the level of effort to understand that is not something I can commit to right now.

It's not just parsing that we need to do. It's converting the where clauses to fit our particular brand of Postgres JSON.

jakesower commented 8 years ago

A couple of notes I made:

Overall, it's hard for me to understand what the full domain of the problem is. Are we working with self-generated SQL or arbitrary SQL?

In any case, it looks like the tests are good and passing, so I'll merge it when you give me a thumbs up, but let me know if you want to discuss my comments first.

dmfenton commented 8 years ago

Thanks @jakesower, I've tried to address your comments with the most recent commit.

We're working with arbitrary SQL that is generated by users. We need to be able to translate those clauses into something that Postgres understands for the way that we're storing our data.

I've tried to make decodeDomains a little more understandable. The problem here is that Geoservices has the concept of a coded value domain. That means that string data is stored as integers, and there is a table that maps them. e.g. 1 -> <50 mph 2 -> 50-60 mph 3 -> 60+ mph

When we store the data we decode the domains to their strings. But for compatibility reasons, we need to be able to handle filter clauses that use the code. e.g. Speed_Limit = 1 -> Speed_Limit = "<50 mph"

Does that make sense?

jakesower commented 8 years ago

It will be interesting to see what SQL queries break this. I think it can become more hardened after time, but it looks sufficient to me.

dmfenton commented 8 years ago

The ideal flow is: Geoservices -> AST -> [Postgres JSON, ElasticSearch, SQLite,...] If I'm going to invest a substantial amount of additional time into this problem it would be down that route. I think language aware parsing is going to be a lot more flexible, but the lift is going to be a lot larger as well.