segasai / q3c

PostgreSQL extension for spatial indexing on a sphere
GNU General Public License v2.0
77 stars 27 forks source link

support for PARALLEL SAFE #32

Open gshennessy opened 3 years ago

gshennessy commented 3 years ago

Would it make sense to add PARALLEL SAFE to the SQL definitions for the q3c functions? The Postgresql Documentation states:

Functions should be labeled parallel unsafe if they modify any database state, or if they make changes to the transaction such as using sub-transactions, or if they access sequences or attempt to make persistent changes to settings (e.g., setval). They should be labeled as parallel restricted if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize in parallel mode (e.g., setseed cannot be executed other than by the group leader because a change made by another process would not be reflected in the leader). In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, functions should be labeled as UNSAFE, which is the default.

From my admittedly limited understanding of the guts of q3c, it doesn't try to do anything that would prevent it from being PARALLEL SAFE, so is this a feasible change?

segasai commented 3 years ago

There is one part in q3c_radial_query() and q3c_join where I avoid some repeated calculations in q3_get_nearby_it() by using a static variable in the C code. I am not sure that's a problem for PARALLEL, but I thought to be on the safe side I didn't set the functions to be PARALLEL SAFE. I would prefer some analysis/testing before making this change.

gshennessy commented 3 years ago

What kind of testing would you be interested in?

segasai commented 3 years ago

testing involving parallel plans and showing that things work normally. But I still would need some analysis of https://github.com/segasai/q3c/blob/master/q3c.c and static variables. I just haven't studied how that works in the cases of parallel workers. (it's probably fine, but still).

tony-12345 commented 2 years ago

I added a parallel test which runs your test scripts 5 times in background threads, with some sleep commands to try to prevent the different threads from all running the same query at the same time. This worked for me on Centos with no errors. https://github.com/tony-12345/q3c/commit/5007376dfe0b8612a7e5a1ef8d4e89f86b22d2ef

I also added some query plans to show that some of your test queries are in fact using parallel workers. https://github.com/tony-12345/q3c/tree/master/queryplans

How do you feel about this showing that parallel safe functions work correctly?

segasai commented 2 years ago

Thanks for looking into this @tony-12345 . I guess, what you show seems enough for me to turn on parallelism. I'll change the function definitions in next few days.

esabol commented 2 years ago

I'll change the function definitions in next few days.

Did this ever happen? Just curious. I'm not seeing anything in the commits.

segasai commented 2 years ago

Sorry, no. The uncommitted diff is still sitting on my machine, I think I got stuck on whether I could just change the q3c--2.0.0.sql (what I've done), or I needed to create a new q3c--2.0.1.sql and the migration sql. I never quite fully understood how extension versions supposed to work...

esabol commented 2 years ago

I would think it would need to be a new version with SQL to migrate, but I'm by no means an expert.

gshennessy commented 2 years ago

would it be helpful for me to learn how to do a 'pull request'? :)

segasai commented 2 years ago

Yes, please! (the sql update scripts will be also needed) Personally I'm seeing no benefit in parallel safe (as q3c doesn't stop parallelization of other parts of the query), that's why I never got around to implement it.

gshennessy commented 2 years ago

My DB servers have 20 cores, which is why I want parallel safe. Most of the time I'm IO bound, but there are times I am CPU bound, so any little bit helps.

segasai commented 2 years ago

Yes, I also have multiple cores, but i'm not sure I've seen many cases where I was cpu limited because of q3c. I'm occasionally cpu limited when doing nearest neighbors queries, but that's usually limited by the PG code rather than q3c.

segasai commented 11 months ago

It's long overdue, but I've made a pr #36 that makes some q3c functions parallel safe (the ones I was confident about, i.e. q3c_ang2ipix ).