postgrespro / pgsphere

PgSphere provides spherical data types, functions, operators, and indexing for PostgreSQL.
https://pgsphere.org
BSD 3-Clause "New" or "Revised" License
16 stars 15 forks source link

Performance Tests and Benchmarking #103

Open vitcpp opened 7 months ago

vitcpp commented 7 months ago

Dear Sirs,

There is a suggestion to implement performance tests and benchmark pgSphere. One of the important pgSphere features is the fast access and search to spherical objects using indexes. To improve existing functionality, we need a way to identify bottlenecks and compare modification results. We should benefit from performance testing.

There are some but not all the questions related to performance testing:

I would like to start discussion of the pgSphere performance testing. I welcome any feedback and suggestions on the subject.

esabol commented 7 months ago

Our most typical use-case:

Our tables have longitudes and latitudes in columns ra and dec, respectively, and we search for matches that appear in a user-specified circle.

First, we create an index on spoint_deg(ra,dec). Then, in response to user requests, we query

select * from sometable where spoint_deg(ra,dec) @ scircle_deg(spoint_deg(1.2345,6.7890),0.166667)

to find all the entries in sometable which have points inside the user-specified circle centered at (1.2345,6.7890) with a radius of 0.166667 degrees.

For test data, any large public domain astronomical catalog would work. I'm not sure which ones are public domain though.

df7cb commented 7 months ago

@esabol, that is the use case I tried to improve in #80, but I'm not 100% sure if it also works with the _deg variants of the functions, could you give that a real-life check?

esabol commented 7 months ago

@esabol, that is the use case I tried to improve in #80, but I'm not 100% sure if it also works with the _deg variants of the functions, could you give that a real-life check?

I'm not sure, but I think it is? It says the index is being used. If I add a column to the table named s_pos, populate it with the result of spoint_deg(ra,dec), create an index on s_pos, the query is 3.1 times faster for this very large table we have (approximately 8 million rows). But it should be faster since it's not calculating spoint_deg(ra,dec) for each row at query time, right?

Here's the output of explain analyze for such a scenario:

db=> explain analyze select * from obscore where spoint_deg(s_ra,s_dec) @ scircle_deg(spoint_deg(1.2345,6.7890),0.166667);
                                                             QUERY PLAN                                 

--------------------------------------------------------------------------------------------------------
----------------------------
 Bitmap Heap Scan on obscore  (cost=318.85..29276.98 rows=8056 width=518) (actual time=2608.485..2608.70
2 rows=77 loops=1)
   Recheck Cond: (spoint_deg(s_ra, s_dec) @ '<(0.02154608961587 , 0.118490402917895) , 0.002908887904421
39>'::scircle)
   Heap Blocks: exact=11
   ->  Bitmap Index Scan on s_point_deg_idx  (cost=0.00..316.84 rows=8056 width=0) (actual time=2553.354
..2553.355 rows=77 loops=1)
         Index Cond: (spoint_deg(s_ra, s_dec) @ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790
442139>'::scircle)
 Planning Time: 5481.711 ms
 Execution Time: 3027.972 ms
(7 rows)

db=> explain analyze select * from obscore where s_pos @ scircle_deg(spoint_deg(1.2345,6.7890),0.166667);
                                                         QUERY PLAN                                     

--------------------------------------------------------------------------------------------------------
--------------------
 Bitmap Heap Scan on obscore  (cost=318.85..29256.84 rows=8056 width=518) (actual time=975.301..975.472 
rows=77 loops=1)
   Recheck Cond: (s_pos @ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790442139>'::scircle)
   Heap Blocks: exact=11
   ->  Bitmap Index Scan on s_pos_idx  (cost=0.00..316.84 rows=8056 width=0) (actual time=975.271..975.2
73 rows=77 loops=1)
         Index Cond: (s_pos @ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790442139>'::scircle)
 Planning Time: 0.252 ms
 Execution Time: 975.583 ms
(7 rows)

Partial list of indexes we have on this table:

    "s_circle_idx" gist (s_circle)
    "s_circle_s_region" gist (scircle_intext(s_region))
    "s_point_deg_idx" gist (spoint_deg(s_ra, s_dec))
    "s_poly_idx" gist (s_poly)
    "s_poly_s_region" gist (spoly_intext(s_region))
    "s_pos_idx" gist (s_pos)

If I repeat the above, they are much faster on subsequent attempts (or maybe the database was busy doing something else when I tried the first time):

db=> explain analyze select * from obscore where s_pos <@ scircle_deg(spoint_deg(1.2345,6.7890),0.166667);
                                                       QUERY PLAN                                       

--------------------------------------------------------------------------------------------------------
----------------
 Bitmap Heap Scan on obscore  (cost=318.85..29256.84 rows=8056 width=518) (actual time=0.180..0.452 rows
=77 loops=1)
   Recheck Cond: (s_pos <@ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790442139>'::scircle)
   Heap Blocks: exact=11
   ->  Bitmap Index Scan on s_pos_idx  (cost=0.00..316.84 rows=8056 width=0) (actual time=0.147..0.148 r
ows=77 loops=1)
         Index Cond: (s_pos <@ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790442139>'::scircle
)
 Planning Time: 0.516 ms
 Execution Time: 0.582 ms
(7 rows)

db=> explain analyze select * from obscore where spoint_deg(s_ra,s_dec) <@ scircle_deg(spoint_deg(1.2345,6.7890),0.166667);
                                                          QUERY PLAN                                    

--------------------------------------------------------------------------------------------------------
----------------------
 Bitmap Heap Scan on obscore  (cost=318.85..29276.98 rows=8056 width=518) (actual time=0.240..0.412 rows
=77 loops=1)
   Recheck Cond: (spoint_deg(s_ra, s_dec) <@ '<(0.02154608961587 , 0.118490402917895) , 0.00290888790442
139>'::scircle)
   Heap Blocks: exact=11
   ->  Bitmap Index Scan on s_point_deg_idx  (cost=0.00..316.84 rows=8056 width=0) (actual time=0.213..0
.214 rows=77 loops=1)
         Index Cond: (spoint_deg(s_ra, s_dec) <@ '<(0.02154608961587 , 0.118490402917895) , 0.0029088879
0442139>'::scircle)
 Planning Time: 0.391 ms
 Execution Time: 0.543 ms
(7 rows)