Opaleye generated left join queries seems to slow down the Postgresql's query planner

sras commented 7 years ago

We have a query that does joins between ten tables, five of which are left joins. This is being constructed using a combination of Opaleye's restrict and leftJoin functions.

This ends up being converted into a 500 line query (fetched from postgresql's query log) with lots of nested sub selects. When we run EXPLAIN ANALYZE on this query, from psql, it shows the following (approximate) times.

60ms to plan/analyse
4ms to execute.

Now, we have another query, constructed by hand, using conventional joins and left joins. This query uses the same logic for joins, and selects the same number of columns, as that of the original Opaleye generated query. Now, when we EXPLAIN ANALYSE it, the numbers are

around 3 ms to plan
around 3 ms to execute

Both of the queries make use of 60+ columns.

This is the result of the a benchmark, comparing these two. First one uses the hand made sql, and uses postgresql-simple's query_ function to run and fetch the results. The second one uses Opaleye's runQuery to execute the 'Query' value and fetch the results.

benchmarking Sql using conventional joins
time                 9.928 ms   (9.467 ms .. 10.41 ms)
                     0.986 R²   (0.968 R² .. 0.997 R²)
mean                 10.41 ms   (10.18 ms .. 10.99 ms)
std dev              1.010 ms   (356.3 μs .. 1.831 ms)
variance introduced by outliers: 52% (severely inflated)

benchmarking join sql generated by Opaleye
time                 79.24 ms   (75.71 ms .. 81.22 ms)
                     0.998 R²   (0.994 R² .. 1.000 R²)
mean                 83.27 ms   (81.02 ms .. 91.60 ms)
std dev              6.254 ms   (661.4 μs .. 9.995 ms)
variance introduced by outliers: 19% (moderately inflated)

Is there any chance that these queries could be converted into a form that can cut down the planning time? Or is this form of queries some kind of fundamental requirement for the features that Opaleye provides? Could it be possible to reduce these to an optimised form, even it means that this optimised form cannot be composed any further?

tomjaguarpaw commented 7 years ago

Thanks for producing some handy statistics!

Is it possible to share your queries? Or at least to describe what the difference in structure is between the Opaleye one and the handwritten one? There is good reason to be optimistic that we can tweak the Opaleye output to match the handwritten one.

saurabhnanda commented 7 years ago

@tomjaguarpaw can we email you both the queries directly? It's for a production project, and not open source.

tomjaguarpaw commented 7 years ago

Yes, please do.

tomjaguarpaw commented 7 years ago

Thanks for sending the queries. There's obviously a lot of redundancy there and it's reasonable to hope that we could remove a lot of that on the Opaleye side. I'm not certain that we could do it quicker than Postgres but it's probably worth a try.

Regrettably I won't have any time to do this myself but I would encourage you to try if you are interested. I'm happy to comment from the sidelines.

saurabhnanda commented 7 years ago

What are the possible.solutions that come to your mind?

On 29 Mar 2017 2:18 am, "tomjaguarpaw" notifications@github.com wrote:

Thanks for sending the queries. There's obviously a lot of redundancy there and it's reasonable to hope that we could remove a lot of that on the Opaleye side. I'm not certain that we could do it quicker than Postgres but it's probably worth a try.

Regrettably I won't have any time to do this myself but I would encourage you to try if you are interested. I'm happy to comment from the sidelines.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/284#issuecomment-289899873, or mute the thread https://github.com/notifications/unsubscribe-auth/AABu0XUUZQkqouBnsT_xNizeFhEPGJBzks5rqXIbgaJpZM4Mqnfw .

tomjaguarpaw commented 7 years ago

The most obvious thing to try is to remove the nested subselects. I don't see that @ocharles's patch https://github.com/tomjaguarpaw/haskell-opaleye/issues/273#issuecomment-289957473 does this itself but it should only require a small adjustment to his patch.

ocharles commented 7 years ago

Could the analyzed plans be shared?

saurabhnanda commented 7 years ago

@sras can you please share the query plans directly with @tomjaguarpaw and @ocharles ?

ocharles commented 7 years ago

Note that https://explain.depesz.com has the option to anonymize, maybe that would be sufficient to share publically.

On Wed, 29 Mar 2017, 8:53 am Saurabh Nanda, notifications@github.com wrote:

@sras https://github.com/sras can you please share the query plans directly with @tomjaguarpaw https://github.com/tomjaguarpaw and @ocharles https://github.com/ocharles ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/284#issuecomment-290012087, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjilOQ5t_Nnm_2MWICccisE4exY2Oks5rqg3hgaJpZM4Mqnfw .

sras commented 7 years ago

@tomjaguarpaw @ocharles

Here are the analysis (anonymized) generated by https://explain.depesz.com.

Opaleye generated query

Nested Loop Left Join  (cost=9.710..162.490 rows=1 width=2486) (actual time=0.751..1.900 rows=16 loops=1)
  ->  Nested Loop Left Join  (cost=9.420..154.170 rows=1 width=2403) (actual time=0.731..1.798 rows=4 loops=1)
          Join Filter: (("delta".quebec_seven five_romeo NOT NULL) AND ("delta".quebec_seven = "four".romeo_india))
          Rows Removed by Join Filter: 217
        ->  Nested Loop Left Join  (cost=9.420..103.800 rows=1 width=2162) (actual time=0.320..0.848 rows=4 loops=1)
              ->  Nested Loop Left Join  (cost=9.140..92.580 rows=1 width=2148) (actual time=0.298..0.804 rows=1 loops=1)
                    ->  Nested Loop  (cost=8.870..84.280 rows=1 width=2070) (actual time=0.272..0.772 rows=1 loops=1)
                            Join Filter: ("charlie_oscar".sierra_three = "lima_mike".quebec_seven)
                          ->  Nested Loop  (cost=8.870..83.250 rows=1 width=518) (actual time=0.256..0.752 rows=1 loops=1)
                                ->  Nested Loop  (cost=8.720..75.080 rows=1 width=471) (actual time=0.238..0.729 rows=1 loops=1)
                                      ->  Nested Loop  (cost=8.450..66.780 rows=1 width=353) (actual time=0.210..0.696 rows=1 loops=1)
                                            ->  Hash Right Join  (cost=8.310..58.610 rows=1 width=291) (actual time=0.179..0.659 rows=1 loops=1)
                                                    Hash Cond: ("hotel_lima".romeo_india = "uniform".quebec_seven)
                                                  ->  Seq Scan on three hotel_lima  (cost=0.000..49.690 rows=160 width=241) (actual time=0.046..0.308 rows=160 loops=1)
                                                          Filter: ((november)::text = 'foxtrot_romeo'::text)
                                                          Rows Removed by Filter: 55
                                                  ->  Hash  (cost=8.300..8.300 rows=1 width=50) (actual time=0.058..0.058 rows=1 loops=1)
                                                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                        ->  Index Scan using quebec_india on hotel_kilo uniform  (cost=0.280..8.300 rows=1 width=50) (actual time=0.040..0.042 rows=1 loops=1)
                                                                Index Cond: ((xray_two five_romeo NOT NULL) AND (xray_two = 16656))
                                                                Filter: ((victor)::text = 'five_alpha'::text)
                                            ->  Index Scan using golf_romeo on two charlie_oscar  (cost=0.140..8.160 rows=1 width=62) (actual time=0.016..0.017 rows=1 loops=1)
                                                    Index Cond: (quebec_seven = 16656)
                                      ->  Index Scan using golf_xray on foxtrot_charlie charlie_mike  (cost=0.270..8.290 rows=1 width=118) (actual time=0.017..0.018 rows=1 loops=1)
                                              Index Cond: (quebec_seven = "charlie_oscar".oscar)
                                ->  Index Scan using romeo_bravo on six alpha  (cost=0.150..8.170 rows=1 width=47) (actual time=0.013..0.014 rows=1 loops=1)
                                        Index Cond: (quebec_seven = "charlie_oscar".india_tango)
                          ->  Seq Scan on sierra_papa lima_mike  (cost=0.000..1.010 rows=1 width=1552) (actual time=0.012..0.013 rows=1 loops=1)
                    ->  Index Scan using whiskey on tango mike  (cost=0.270..8.290 rows=1 width=78) (actual time=0.021..0.023 rows=1 loops=1)
                            Index Cond: (("uniform".quebec_seven = romeo_india) AND ((november)::text = 'foxtrot_romeo'::text))
              ->  Index Scan using xray_alpha on papa delta  (cost=0.280..11.190 rows=3 width=14) (actual time=0.016..0.025 rows=4 loops=1)
                      Index Cond: ("uniform".quebec_seven = juliet)
        ->  Seq Scan on three four  (cost=0.000..49.690 rows=55 width=241) (actual time=0.006..0.163 rows=55 loops=4)
                Filter: ((november)::text = 'lima_papa'::text)
                Rows Removed by Filter: 160
  ->  Index Scan using zulu on india_two romeo_sierra  (cost=0.290..8.310 rows=1 width=83) (actual time=0.006..0.012 rows=4 loops=4)
          Index Cond: (juliet = "uniform".quebec_seven)

Handwritten one

sted Loop Left Join  (cost=1.680..163.200 rows=1 width=2454) (actual time=0.961..2.802 rows=16 loops=1)
  ->  Nested Loop Left Join  (cost=1.390..154.880 rows=1 width=2375) (actual time=0.947..2.699 rows=4 loops=1)
          Join Filter: (((romeo_tango1.november)::text = 'lima'::text) AND (romeo_tango1.romeo_india = whiskey_yankee.quebec_seven))
          Rows Removed by Join Filter: 860
        ->  Nested Loop Left Join  (cost=1.390..102.500 rows=1 width=2134) (actual time=0.272..0.322 rows=4 loops=1)
              ->  Nested Loop Left Join  (cost=1.110..91.280 rows=1 width=2124) (actual time=0.258..0.288 rows=1 loops=1)
                    ->  Nested Loop Left Join  (cost=0.840..83.660 rows=1 width=2046) (actual time=0.235..0.263 rows=1 loops=1)
                            Join Filter: (romeo_tango1.romeo_india = oscar_five.quebec_seven)
                          ->  Nested Loop  (cost=0.840..33.960 rows=1 width=1805) (actual time=0.072..0.097 rows=1 loops=1)
                                  Join Filter: (five.sierra_three = foxtrot_papa.quebec_seven)
                                ->  Nested Loop  (cost=0.840..32.940 rows=1 width=257) (actual time=0.051..0.071 rows=1 loops=1)
                                      ->  Nested Loop  (cost=0.690..24.760 rows=1 width=222) (actual time=0.041..0.056 rows=1 loops=1)
                                            ->  Nested Loop  (cost=0.420..16.470 rows=1 width=108) (actual time=0.026..0.033 rows=1 loops=1)
                                                  ->  Index Scan using golf_romeo on two_quebec five  (cost=0.140..8.160 rows=1 width=62) (actual time=0.011..0.013 rows=1 loops=1)
                                                          Index Cond: (quebec_seven = 16656)
                                                  ->  Index Scan using quebec_india on hotel oscar_five  (cost=0.280..8.290 rows=1 width=50) (actual time=0.007..0.009 rows=1 loops=1)
                                                          Index Cond: (xray_two = 16656)
                                            ->  Index Scan using golf_xray on foxtrot_charlie seven  (cost=0.270..8.290 rows=1 width=118) (actual time=0.007..0.009 rows=1 loops=1)
                                                    Index Cond: (quebec_seven = five.oscar_foxtrot)
                                      ->  Index Scan using romeo_bravo on six victor  (cost=0.150..8.170 rows=1 width=39) (actual time=0.004..0.005 rows=1 loops=1)
                                              Index Cond: (quebec_seven = five.india_tango)
                                ->  Seq Scan on sierra_papa foxtrot_papa  (cost=0.000..1.010 rows=1 width=1552) (actual time=0.013..0.015 rows=1 loops=1)
                          ->  Seq Scan on three two_two  (cost=0.000..49.690 rows=1 width=241) (actual time=0.078..0.078 rows=0 loops=1)
                                  Filter: ((november)::text = 'whiskey_kilo'::text)
                                  Rows Removed by Filter: 215
                    ->  Index Scan using whiskey_sierra on tango_whiskey oscar_two  (cost=0.270..7.610 rows=1 width=78) (actual time=0.014..0.014 rows=0 loops=1)
                            Index Cond: ((romeo_india = oscar_five.quebec_seven) AND ((november)::text = 'whiskey_kilo'::text))
              ->  Index Scan using xray_alpha on papa whiskey_yankee  (cost=0.280..11.190 rows=3 width=14) (actual time=0.007..0.013 rows=4 loops=1)
                      Index Cond: (juliet = oscar_five.quebec_seven)
        ->  Seq Scan on three delta  (cost=0.000..49.150 rows=215 width=241) (actual time=0.006..0.300 rows=215 loops=4)
  ->  Index Scan using zulu on india_two tango_sierra  (cost=0.290..8.310 rows=1 width=83) (actual time=0.004..0.010 rows=4 loops=4)
          Index Cond: (juliet = oscar_five.quebec_seven)

chrisdone commented 7 years ago

I'm curious, is there some potential to use prepared queries? It seems that the overhead is the analysis, and that the resulting planned query is the same, they both take 3-4ms~. If you prepare the query once, then subsequent uses wouldn't see the overhead created by Opaleye.

Perhaps Opaleye already has such an API. But the useage is like:

PREPARE usrrptplan (int) AS
    SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
    AND l.date = $2;

EXECUTE usrrptplan(1, current_date);

Perhaps if Opaleye had an API to create the prepared query once at the start of your program and then use it forever afterwards, that would be handy. I think Opaleye might struggle to compete with PostgreSQL at reducing the query down. Also, implementing optimizations like that risks compromising the consistency of the library, which is why HaskellDB had so many bugs.

In fact, a prepared query might execute a little faster than your hand-written one. Quoth the PostgreSQL docs on prepared queries:

The performance difference will be particularly significant if the statements are complex to plan or rewrite, for example, if the query involves a join of many tables or requires the application of several rules.

Your big join query seems to match the criteria for preparation.

tomjaguarpaw commented 7 years ago

I'm curious, is there some potential to use prepared queries?

Yes, I imagine prepared queries would fit nicely into Opaleye.

ocharles commented 7 years ago

I've done some research into this with hasql and rel8, but I never got it finished. I think it could work nicely!

On 28 Jul 2017 1:57 pm, "tomjaguarpaw" notifications@github.com wrote:

I'm curious, is there some potential to use prepared queries?

Yes, I imagine prepared queries would fit nicely into Opaleye.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tomjaguarpaw/haskell-opaleye/issues/284#issuecomment-318644844, or mute the thread https://github.com/notifications/unsubscribe-auth/AABRjqFztzvBy25lD7W2rmRDgqoHWcskks5sSdq-gaJpZM4Mqnfw .

tomjaguarpaw commented 7 years ago

Basically, the way I would do this would be to reuse the query generation apparatus in RunQuery and have a product profunctor that deals with inserting the $n placeholders. The slightly awkward part would be to make sure that the positional arguments always line up with what we want them to but I think this is achievable given some thought.

tomjaguarpaw / haskell-opaleye

Opaleye generated left join queries seems to slow down the Postgresql's query planner #284