Closed ghost closed 3 years ago
I'd be interested in a conversation about this too... basic box filtering, both for PostGIS types and PgSQL native types, might be an interesting thing to do as a starting point.
Ah, sorry, it's my oversight.
The supported functions are listed at: https://github.com/heterodb/pg-strom/blob/master/src/codegen.c#L364
Once operators/functions in WHERE-clause and others are all constructed with supported functions, this logic replaces the expression into GPU-version. For example, LIKE operator in GPU-version is implemented in cuda_textlib.h
So, once we implement GPU-version of PostGIS functions (but not all, possible to start from partial support), we will be able to execute this GPU-version on GPU.
If folks can contribute to implement GPU-version of PostGIS, it is great helpful. For short terms, box type support for PgSQL native types are good starting point.
We positions PostGIS support one of the major feature on v3.0, however, please don't expect all the feature of PostGIS. We begin to support a part of them at the initial support.
What functions do you think you would target?
Not determined yet. We need to have fieldwork to pick up the target ones.
Anything we can do to help with that process (or others)?
If you could introduce me your workloads including PostGIS, it is helpful to pick up. (Of course, technical hardness is one other point.) On the other hands, please understand our development roadmap. I will start development of v3.0 features including PostGIS support after the v2.0 release; planned at next April.
I think the classic SFS Functions to do some spatial relation/join (ST_Intersects, ST_Within, ST_Relate in general) and to aggregate geometries (ST_Union) would be most valuable. Nevertheless these are probably the higher sophisticated funtions. ST_Distance would be another useful candidate which I think would be easier to implement.
sidekick: recently I found this blog post about benchmarking hadoop based geo computing frameworks where PostGIS is used as a reference. I found it quite interesting, that PostGIS only needs a proper mechanism to handle huge amounts of data to outperform GeoMesa and GeoWave on the full range of data amounts. I guess pg-strom v3.0 would be that mechansim.
here's the blog http://blog.mgm-tp.com/2016/03/geomesa-vs-geowave/
Heard about this in a talk about PostGIS and it sounds great. Did PostGIS functionality get added?
If it makes sense from business standpoint, I (and HeteroDB) never hesitate to enhance GPU-support for PostGIS functionality. Yes, it is likely possible from technology standpoint.
However, nobody has disclosed their workloads and committed for business.
If you are interested in a business partnership with HeteroDB, please contact to contact@heterodb.com
.
I have a use case, essentially the NYC Taxi problem.
Could you send a message to contact@heterodb.com
for more detailed (and business perspective) discussion.
The recent version has several PostGIS functions with GiST index support on GPU.
This looks awesome, I'm glad it got added. I saw in the list of functions that st_contains()
is implemented. I think for normal PostGIS st_interesects()
is faster (and also answers a different question. Do you have plans to add st_intersects()
? I think st_crosses()
is similar, but not quite the same.
It is because our PoC customer showed list of PostGIS functions they use, and st_contains()
was contained, but st_intersects()
was not.
It is not possible to implement very soon, but I like to add it to the feature list.
PostGIS has two version of st_intersects
:
bool st_intersects(geometry, geometry)
bool st_intersects(geography, geography)
Which version are you saying? The geography version is a bit tough work.
I'm surprised they didn't want st_intersects()
, it seems like the most common operation.
I think the geometry, geometry function is problematic most used. Tagging @ksakamoto09 who knows a lot more about this than I do.
PostGIS has two version of
st_intersects
:* `bool st_intersects(geometry, geometry)` * `bool st_intersects(geography, geography)`
Which version are you saying? The geography version is a bit tough work.
I would say that functions operating on projected "geometry" are more relevant than the "geography" ones. It`s obvious that there is a certain need for "geography" types in spatial data processing, but we GI People are used to projection systems and really like to use rather planar coordinate systems than geographic deegrees most of the time.
To further prioritize functions: I think ST_relate()
is the most generic one which all other spatial relation functions can be derived from. Apart from that I also think that ST_intersects
is the one which is most often used. Unfortunately it covers only approx. 50% to 70% of all spatial use cases I can think of. The combination with ST_crosses
and ST_contains
nevertheless would be a really powerful toolset though.
I already implemented ST_relate()
as a basis of other functions, but missed to list up on the function references.
https://github.com/heterodb/pg-strom/blob/master/src/cuda_postgis.cu#L5644
I already implemented
ST_relate()
as a basis of other functions, but missed to list up on the function references. https://github.com/heterodb/pg-strom/blob/master/src/cuda_postgis.cu#L5644
Really? Thats great... basically ST_intersects()
is just syntactic sugar/more or less equivalent to ST_Relate(a.geom, b.geom, 'T*********')
, perhaps with a slightly adapted DE-9IM Matrix depending on what you want to include in your query results (see https://postgis.net/docs/ST_Relate.html)
EDIT: In your Code there is only the 2 Parameter ST_Relate()
implemented which returns a DE-9IM Intersection Matrix (which is good for further analysis of possible Intersection inside a procedure for example)... nevertheless my post above referred to the 3 Parameter function, where you pass a Matrix-Pattern which needs to be fulfilled, then the function returns True, otherwise False. This variant is easily usable in certain queries then. Nevertheless, impressive work so far...can't wait to spin up our GPU´d database host and test a little bit.
Thanks for opening my eyes to ST_relate()
@pinkerltm. If the 3 Parameter version can be implemented by @kaigai, does that have any implications about a spatial index? From what I understand, ST_intersects()
, in most databases, makes use of the spatial index, but I don't know about ST_relate()
. And, of course, fingers crossed that ST_intersects()
makes the cut.
Thanks for opening my eyes to
ST_relate()
@pinkerltm. If the 3 Parameter version can be implemented by @kaigai, does that have any implications about a spatial index? From what I understand,ST_intersects()
, in most databases, makes use of the spatial index, but I don't know aboutST_relate()
. And, of course, fingers crossed thatST_intersects()
makes the cut.
You are right... the more commonly used Functions like ST_intersects, ST_contains, etc. differ from ST_relate as the use of spatial indexing is already builtin for convinience. Nevertheless I personally find it better to be aware of and to explicitely make use of the &&
operator, which is basically the BoundingBox Filter operation to effectively reduce rows by their spatial index, as there are alot of spatial functions which do NOT automatically use the spatial index (see postgis_usage below)
@kaigai This is relevant as pg-strom also needs the spatial &&
operator implemented to effectively use spatial indexing.
Ressources:
Even though this presentation slides are written in Japanese, PG-Strom has basic spatial index support. https://www.slideshare.net/kaigai/20201128oscfukuokaonlinegpupostgis/17
p.31-33 shows its performance improvement. (Likely, these slides are undestandable for non-Japanese speakers)
Just for your information. It's my presentation slides on PGconf.Online (3/1-3/3) at https://pgconf.ru/en/2021 https://www.slideshare.net/kaigai/20210301pgconfonlinegpupostgisgistindex
I would love to see pg_strom being compatible to PostGIS spatial data types and functions. I am aware that this is possibly still a long way to go, as PostGIS has a lot of dependencies which aren't GPU parallelizable themselves I guess.
Nevertheless there is a lot of data in spatial analytics and also a lot of numeric calculation done on it. Also there are already a lot of spatial data stores existing which run PostGIS. So there is a big potential in doing GPU accelerated spatial querying. Other GPU DB vendors like Kinetica or MapD already identified spatial data as a first class citizen in their own products.
Show me the way and I'll give my best to contribute.