sraoss / pg_ivm

IVM (Incremental View Maintenance) implementation as a PostgreSQL extension
Other
862 stars 24 forks source link

Database with many immv and concurrent transactions deadlocking #33

Open FourSeventy opened 1 year ago

FourSeventy commented 1 year ago

I have a database that has 9 immv's that all create slightly different views of the same few base tables. These base tables are regularly inserted to and updated concurrently by around 5 different external services and processes. They are updated rapidly, but the updates are only to a few thousand records, which is a small faction of the total number of rows. The base tables have tens of millions of records in them.

The problem I am facing is that my update and insertion processes all become deadlocked. The error messages look like this

Process 253446 waits for ExclusiveLock on relation 191517 of database 16927; blocked by process 103564.
Process 103564 waits for ExclusiveLock on relation 191769 of database 16927; blocked by process 253446.

Relation 191517 and relation 191769 are immvs. Unfortunately I don't know which one they are because I dropped the views right away when they were deadlocking and didn't have time to look it up.

Do I have too many immvs on the same tables? Is updating the db from multiple threads not supported with immvs?

Below are my immv definitions.

SELECT create_immv('data_hourly_sessions', '
  SELECT
  property_id as property_idd,
  date_trunc(''hour'', sessions.timestamp AT TIME ZONE ''UTC'') AS buckett,
  COUNT(*) AS visits,
  SUM(duration) as total_duration,
  SUM(page_views) as total_pageviews
  FROM sessions
  GROUP BY property_idd, buckett;
');

SELECT create_immv('data_hourly_conversions', '
  SELECT
  property_id as property_idd,
  date_trunc(''hour'', conversions.timestamp AT TIME ZONE ''UTC'') AS buckett,
  COUNT(*) as orders,
  SUM(total_price - COALESCE(total_refund, 0)) as total_sales,
  SUM(total_price) as total_price,
  SUM(subtotal_price) as subtotal_price,     
  SUM(total_tax) as total_tax,   
  SUM(total_discounts) as total_discounts,
  SUM(total_shipping) as total_shipping,
  SUM(item_quantity) as item_quantity,
  SUM(total_refund) as total_refund
  FROM conversions
  WHERE event_name=''order''
  AND (status NOT IN (''voided'', ''cancelled'', ''failed'', ''trash'') OR status IS NULL)
  GROUP BY property_idd, buckett;
');

SELECT create_immv('data_hourly_admetrics', '
   SELECT
   property_id as property_idd,
   date_trunc(''hour'', timestamp AT TIME ZONE ''UTC'') AS buckett,
   SUM(COALESCE(googlead_cost,0) + COALESCE(facebookad_spend,0)  + COALESCE(customspend_spend,0) + COALESCE(bingad_spend,0) + COALESCE(tiktokad_spend,0) + COALESCE(pinterestad_spend,0) )  as spend,
   SUM(COALESCE(converted_spend, 0)) as converted_spend
   FROM admetrics
   GROUP BY property_idd, buckett
');

SELECT create_immv('data_hourly_by_channel_sessions', '
    SELECT 
    property_id as property_idd,
    date_trunc(''hour'', sessions.timestamp AT TIME ZONE ''UTC'') AS buckett,
    COALESCE(channel,''unsegmented'') as channell,
    COUNT(*) AS visits,
    SUM(duration) as total_duration,
    SUM(page_views) as total_pageviews
    FROM sessions
    GROUP BY property_idd, buckett, channell
');

SELECT create_immv('data_hourly_by_channel_conversions', '
  SELECT
  sessions.property_id as property_idd,
  date_trunc(''hour'', conversions.timestamp AT TIME ZONE ''UTC'') AS buckett,
  COALESCE(sessions.channel,''unsegmented'') as channell,
  attribution.model as modell,
  SUM(COALESCE(attribution.credit,0) * (COALESCE(conversions.total_price,0) - COALESCE(conversions.total_refund,0))) as total_sales,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_price, 0)) as total_price,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.subtotal_price, 0)) as subtotal_price,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_tax, 0)) as total_tax,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_discounts, 0)) as total_discounts,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_shipping, 0)) as total_shipping,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.item_quantity, 0)) as item_quantity,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_refund, 0)) as total_refund,
  SUM(COALESCE(attribution.credit,0)) as orders
  FROM sessions
  JOIN attribution ON sessions.id=attribution.session_id
  JOIN conversions ON attribution.conversion_id = conversions.id
  WHERE conversions.event_name=''order''
  AND (conversions.status NOT IN (''voided'', ''cancelled'', ''failed'', ''trash'') OR conversions.status IS NULL)
  GROUP BY property_idd, buckett, channell, modell
');

SELECT create_immv('data_hourly_by_channel_admetrics', '
   SELECT
   property_id as property_idd,
   date_trunc(''hour'', timestamp AT TIME ZONE ''UTC'') AS buckett,
   channel as source,
   SUM(coalesce(googlead_cost,0) + coalesce(facebookad_spend,0) + coalesce(customspend_spend,0) + coalesce(bingad_spend,0) + coalesce(tiktokad_spend,0) + coalesce(pinterestad_spend,0)  )  as spend,
   SUM(COALESCE(converted_spend,0)) as converted_spend
   FROM admetrics
   GROUP BY property_idd, buckett, source
');

SELECT create_immv('data_hourly_by_segment_sessions', '
    SELECT
    property_id as property_idd,
    date_trunc(''hour'', sessions.timestamp AT TIME ZONE ''UTC'') AS buckett,
    COALESCE(channel,''unsegmented'') as channell,
    COALESCE(segment,'''') as segmentt,
    COUNT(*) AS visits,
    SUM(duration) as total_duration,
    SUM(page_views) as total_pageviews
    FROM sessions
    GROUP BY property_idd, buckett, channell, segmentt
');

SELECT create_immv('data_hourly_by_segment_conversions', '
  SELECT
  sessions.property_id as property_idd,
  date_trunc(''hour'', conversions.timestamp AT TIME ZONE ''UTC'') AS buckett,
  COALESCE(sessions.channel,''unsegmented'') as channell,
  COALESCE(sessions.segment,'''') as segmentt,
  attribution.model as modell,
  SUM(COALESCE(attribution.credit,0) * (COALESCE(conversions.total_price,0) - COALESCE(conversions.total_refund,0))) as total_sales,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_price, 0)) as total_price,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.subtotal_price, 0)) as subtotal_price,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_tax, 0)) as total_tax,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_discounts, 0)) as total_discounts,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_shipping, 0)) as total_shipping,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.item_quantity, 0)) as item_quantity,
  SUM(COALESCE(attribution.credit,0) * COALESCE(conversions.total_refund, 0)) as total_refund,
  SUM(COALESCE(attribution.credit,0)) as orders
  FROM sessions
  JOIN attribution ON sessions.id=attribution.session_id
  JOIN conversions ON attribution.conversion_id = conversions.id
  WHERE conversions.event_name=''order''
  AND (conversions.status NOT IN (''voided'', ''cancelled'', ''failed'', ''trash'') OR conversions.status IS NULL)
  GROUP BY buckett, property_idd, channell, segmentt, modell
');

SELECT create_immv('data_hourly_by_segment_admetrics', '
   SELECT
   accounts.property_id as property_idd,
   date_trunc(''hour'', timestamp AT TIME ZONE ''UTC'') AS buckett,
   accounts.channel as source,
   NULLIF(accounts.external_id,'''') as external_idd,
   SUM(coalesce(googlead_cost,0) + coalesce(facebookad_spend,0) + coalesce(customspend_spend,0) + coalesce(bingad_spend,0) + coalesce(tiktokad_spend,0) + coalesce(pinterestad_spend,0) )  as spend,
   SUM(coalesce(converted_spend,0)) as converted_spend
   FROM accounts
   JOIN campaigns ON campaigns.account_id = accounts.id
   JOIN adsets on adsets.campaign_id = campaigns.id
   JOIN ads ON ads.adset_id = adsets.id
   JOIN admetrics ON admetrics.ad_id=ads.id
   GROUP BY property_idd, buckett, source, external_idd
');
FourSeventy commented 1 year ago

Update on this. I looked at all of the insert and update queries that run in transactions. I only have 4 queries that run in transactions. I made sure that they all update tables in the same order. I still end up getting into a deadlock somehow. I'm working to lock down the exact queries that are getting deadlocked, but I know that one of the queries that is caught in the deadlock is a simple update not part of any transactions. Here is that query UPDATE conversions SET sent_to_capi=true WHERE id = ANY($1)

Here is the breakdown of the four services that write to the database concurrently and the tables that they write to in order

    service1
         accounts -> campaigns -> adsets -> ads -> admetrics
     service2
         customers -> users -> sessions -> conversions
     service3
         customers -> users -> sessions -> conversions
     service4
         customers -> users -> sessions -> surveys -> conversions -> pageviews
FourSeventy commented 1 year ago

Second update. I'm currently running an experiment with just the first three immvs, data_hourly_sessions, data_hourly_conversions, and data_hourly_admetrics. So far no deadlocks. These immvs don't have any JOINs so its possible that the immvs with the joins are causing the issue.

yugo-n commented 1 year ago

@FourSeventy I am sorry for a late reply. I have not succeeded to reproduce the deadlock yet. For investigation, could you please provide more details?

Firstly, may I have definitions of tables involving to these IMMV

Here is the breakdown of the four services that write to the database concurrently and the tables that they write to in order

Does each service access these tables in this order in a separate transaction?

Here is that query UPDATE conversions SET sent_to_capi=true WHERE id = ANY($1)

This query is not included in any above transactions and executed in the other transaction?