heterodb / pg-strom

PG-Strom - Master development repository
http://heterodb.github.io/pg-strom/
Other
1.3k stars 162 forks source link

PostGIS compatibility #305

Closed ghost closed 3 years ago

ghost commented 7 years ago

I would love to see pg_strom being compatible to PostGIS spatial data types and functions. I am aware that this is possibly still a long way to go, as PostGIS has a lot of dependencies which aren't GPU parallelizable themselves I guess.

Nevertheless there is a lot of data in spatial analytics and also a lot of numeric calculation done on it. Also there are already a lot of spatial data stores existing which run PostGIS. So there is a big potential in doing GPU accelerated spatial querying. Other GPU DB vendors like Kinetica or MapD already identified spatial data as a first class citizen in their own products.

Show me the way and I'll give my best to contribute.

pramsey commented 7 years ago

I'd be interested in a conversation about this too... basic box filtering, both for PostGIS types and PgSQL native types, might be an interesting thing to do as a starting point.

kaigai commented 7 years ago

Ah, sorry, it's my oversight.

The supported functions are listed at: https://github.com/heterodb/pg-strom/blob/master/src/codegen.c#L364

Once operators/functions in WHERE-clause and others are all constructed with supported functions, this logic replaces the expression into GPU-version. For example, LIKE operator in GPU-version is implemented in cuda_textlib.h

So, once we implement GPU-version of PostGIS functions (but not all, possible to start from partial support), we will be able to execute this GPU-version on GPU.

If folks can contribute to implement GPU-version of PostGIS, it is great helpful. For short terms, box type support for PgSQL native types are good starting point.

kaigai commented 7 years ago

We positions PostGIS support one of the major feature on v3.0, however, please don't expect all the feature of PostGIS. We begin to support a part of them at the initial support.

pramsey commented 6 years ago

What functions do you think you would target?

kaigai commented 6 years ago

Not determined yet. We need to have fieldwork to pick up the target ones.

pramsey commented 6 years ago

Anything we can do to help with that process (or others)?

kaigai commented 6 years ago

If you could introduce me your workloads including PostGIS, it is helpful to pick up. (Of course, technical hardness is one other point.) On the other hands, please understand our development roadmap. I will start development of v3.0 features including PostGIS support after the v2.0 release; planned at next April.

pinkerltm commented 6 years ago

I think the classic SFS Functions to do some spatial relation/join (ST_Intersects, ST_Within, ST_Relate in general) and to aggregate geometries (ST_Union) would be most valuable. Nevertheless these are probably the higher sophisticated funtions. ST_Distance would be another useful candidate which I think would be easier to implement.

sidekick: recently I found this blog post about benchmarking hadoop based geo computing frameworks where PostGIS is used as a reference. I found it quite interesting, that PostGIS only needs a proper mechanism to handle huge amounts of data to outperform GeoMesa and GeoWave on the full range of data amounts. I guess pg-strom v3.0 would be that mechansim.

pinkerltm commented 6 years ago

here's the blog http://blog.mgm-tp.com/2016/03/geomesa-vs-geowave/

jaredlander commented 5 years ago

Heard about this in a talk about PostGIS and it sounds great. Did PostGIS functionality get added?

kaigai commented 5 years ago

If it makes sense from business standpoint, I (and HeteroDB) never hesitate to enhance GPU-support for PostGIS functionality. Yes, it is likely possible from technology standpoint. However, nobody has disclosed their workloads and committed for business. If you are interested in a business partnership with HeteroDB, please contact to contact@heterodb.com.

jaredlander commented 5 years ago

I have a use case, essentially the NYC Taxi problem.

kaigai commented 5 years ago

Could you send a message to contact@heterodb.com for more detailed (and business perspective) discussion.

kaigai commented 3 years ago

The recent version has several PostGIS functions with GiST index support on GPU.

jaredlander commented 3 years ago

This looks awesome, I'm glad it got added. I saw in the list of functions that st_contains() is implemented. I think for normal PostGIS st_interesects() is faster (and also answers a different question. Do you have plans to add st_intersects()? I think st_crosses() is similar, but not quite the same.

kaigai commented 3 years ago

It is because our PoC customer showed list of PostGIS functions they use, and st_contains() was contained, but st_intersects() was not. It is not possible to implement very soon, but I like to add it to the feature list.

kaigai commented 3 years ago

PostGIS has two version of st_intersects:

Which version are you saying? The geography version is a bit tough work.

jaredlander commented 3 years ago

I'm surprised they didn't want st_intersects(), it seems like the most common operation.

I think the geometry, geometry function is problematic most used. Tagging @ksakamoto09 who knows a lot more about this than I do.

pinkerltm commented 3 years ago

PostGIS has two version of st_intersects:

* `bool st_intersects(geometry, geometry)`

* `bool st_intersects(geography, geography)`

Which version are you saying? The geography version is a bit tough work.

I would say that functions operating on projected "geometry" are more relevant than the "geography" ones. It`s obvious that there is a certain need for "geography" types in spatial data processing, but we GI People are used to projection systems and really like to use rather planar coordinate systems than geographic deegrees most of the time.

To further prioritize functions: I think ST_relate() is the most generic one which all other spatial relation functions can be derived from. Apart from that I also think that ST_intersects is the one which is most often used. Unfortunately it covers only approx. 50% to 70% of all spatial use cases I can think of. The combination with ST_crosses and ST_contains nevertheless would be a really powerful toolset though.

kaigai commented 3 years ago

I already implemented ST_relate() as a basis of other functions, but missed to list up on the function references. https://github.com/heterodb/pg-strom/blob/master/src/cuda_postgis.cu#L5644

pinkerltm commented 3 years ago

I already implemented ST_relate() as a basis of other functions, but missed to list up on the function references. https://github.com/heterodb/pg-strom/blob/master/src/cuda_postgis.cu#L5644

Really? Thats great... basically ST_intersects() is just syntactic sugar/more or less equivalent to ST_Relate(a.geom, b.geom, 'T*********'), perhaps with a slightly adapted DE-9IM Matrix depending on what you want to include in your query results (see https://postgis.net/docs/ST_Relate.html)

EDIT: In your Code there is only the 2 Parameter ST_Relate() implemented which returns a DE-9IM Intersection Matrix (which is good for further analysis of possible Intersection inside a procedure for example)... nevertheless my post above referred to the 3 Parameter function, where you pass a Matrix-Pattern which needs to be fulfilled, then the function returns True, otherwise False. This variant is easily usable in certain queries then. Nevertheless, impressive work so far...can't wait to spin up our GPU´d database host and test a little bit.

jaredlander commented 3 years ago

Thanks for opening my eyes to ST_relate() @pinkerltm. If the 3 Parameter version can be implemented by @kaigai, does that have any implications about a spatial index? From what I understand, ST_intersects(), in most databases, makes use of the spatial index, but I don't know about ST_relate(). And, of course, fingers crossed that ST_intersects() makes the cut.

pinkerltm commented 3 years ago

Thanks for opening my eyes to ST_relate() @pinkerltm. If the 3 Parameter version can be implemented by @kaigai, does that have any implications about a spatial index? From what I understand, ST_intersects(), in most databases, makes use of the spatial index, but I don't know about ST_relate(). And, of course, fingers crossed that ST_intersects() makes the cut.

You are right... the more commonly used Functions like ST_intersects, ST_contains, etc. differ from ST_relate as the use of spatial indexing is already builtin for convinience. Nevertheless I personally find it better to be aware of and to explicitely make use of the && operator, which is basically the BoundingBox Filter operation to effectively reduce rows by their spatial index, as there are alot of spatial functions which do NOT automatically use the spatial index (see postgis_usage below)

@kaigai This is relevant as pg-strom also needs the spatial && operator implemented to effectively use spatial indexing.

Ressources:

kaigai commented 3 years ago

Even though this presentation slides are written in Japanese, PG-Strom has basic spatial index support. https://www.slideshare.net/kaigai/20201128oscfukuokaonlinegpupostgis/17

p.31-33 shows its performance improvement. (Likely, these slides are undestandable for non-Japanese speakers)

kaigai commented 3 years ago

Just for your information. It's my presentation slides on PGconf.Online (3/1-3/3) at https://pgconf.ru/en/2021 https://www.slideshare.net/kaigai/20210301pgconfonlinegpupostgisgistindex