lusterchris / Performance-Tuning

0 stars 0 forks source link

crd_position_rollups #8

Open lusterchris opened 1 month ago

lusterchris commented 1 month ago
  1. Check Existing Indexes

You can check which indexes exist on your tables with the following command:

SELECT indexname, indexdef FROM pg_indexes WHERE tablename IN ('am_positions', 'portfolio_accruals', 'account_breakdowns', 'listings', 'currencies', 'rollup_dates');

  1. Create New Indexes

If you don't already have indexes on the columns used in the JOIN, WHERE, or GROUP BY clauses, create them:

-- Create index on as_of_date and portfolio_uid in am_positions table CREATE INDEX IF NOT EXISTS am_positions_as_of_date_portfolio_uid_idx ON codex.am_positions (as_of_date, portfolio_uid);

-- Create index on as_of_date and portfolio_uid in portfolio_accruals table CREATE INDEX IF NOT EXISTS portfolio_accruals_as_of_date_portfolio_uid_idx ON codex.portfolio_accruals (as_of_date, portfolio_uid);

-- Create index on as_of_date and portfolio_uid in account_breakdowns table CREATE INDEX IF NOT EXISTS account_breakdowns_as_of_date_portfolio_uid_idx ON onyx_api.account_breakdowns (am_portfolio_uid, as_of_date);

-- Create index on as_of_date in rollup_dates table CREATE INDEX IF NOT EXISTS rollup_dates_as_of_date_idx ON onyx_api.rollup_dates (as_of_date);

  1. Use EXPLAIN ANALYZE

To analyze your query's performance and understand where it's slow, use:

EXPLAIN ANALYZE

; This will give you a detailed report on the query execution plan, highlighting the slowest operations. 4. Partitioning (Optional) If the as_of_date is a frequently used column and the tables are very large, partitioning by as_of_date could improve performance. Here’s an example for table partitioning: -- Partition am_positions by as_of_date ALTER TABLE codex.am_positions PARTITION BY RANGE (as_of_date); -- Create partitions for am_positions CREATE TABLE codex.am_positions_2023 PARTITION OF codex.am_positions FOR VALUES FROM ('2023-01-01') TO ('2023-12-31'); 5. Increasing Parallelism To take advantage of parallel query execution, ensure that parallelism is enabled and tuned in your PostgreSQL configuration. You can check and set these parameters: -- Check current parallelism settings SHOW max_parallel_workers_per_gather; SHOW work_mem; -- Set parallel workers for your session (adjust as needed) SET max_parallel_workers_per_gather = 4; SET work_mem = '256MB'; 6. Temporary Tables If your query involves large datasets that can be broken down into smaller parts, you can use temporary tables to store intermediate results. Here’s how you could do it: -- Store first part of the query in a temporary table CREATE TEMP TABLE temp_accruals AS SELECT * FROM codex.portfolio_accruals WHERE as_of_date BETWEEN '2024-01-01' AND '2024-12-31'; -- Then use this temp table in the main query SELECT df4.as_of_date, df4.portfolio_uid, df4.listing_uid, SUM(df4.mkt_val_sod) AS mkt_val_sod, SUM(df4.acc_int_tot_qty) AS acc_int_tot_qty FROM temp_accruals df4 GROUP BY df4.as_of_date, df4.portfolio_uid, df4.listing_uid; 7. Optimizing Memory Settings For the whole system, you might want to adjust the memory settings in postgresql.conf. Here's how to set them: -- Increase work_mem (for complex queries) ALTER SYSTEM SET work_mem = '256MB'; -- Set shared_buffers to use more RAM for caching ALTER SYSTEM SET shared_buffers = '8GB'; -- Set effective_cache_size to allow better usage of system RAM ALTER SYSTEM SET effective_cache_size = '24GB'; -- Reload the configuration for the changes to take effect SELECT pg_reload_conf(); 8. Analyze and Vacuum Tables Ensure your tables are properly analyzed and vacuumed to help the planner choose optimal plans: -- Analyze and vacuum the tables involved VACUUM ANALYZE codex.am_positions; VACUUM ANALYZE codex.portfolio_accruals; VACUUM ANALYZE onyx_api.account_breakdowns; By following these commands, you can significantly reduce the query's execution time.
lusterchris commented 1 month ago

List of tables

The query references multiple tables within the FROM clauses and joins. These are the key tables involved:

  1. codex.am_positions: Contains position data like as_of_date, portfolio_uid, listing_uid, local_currency, n_shares, and value_usd.

  2. codex.portfolio_accruals: Holds accrual information such as as_of_date, portfolio_uid, listing_uid, accrual_type_code, accruals_usd, reclaims_usd, and spendable_accruals_usd.

  3. onyx_api.account_breakdowns: Appears to be linked to portfolios with fields like am_portfolio_uid, effective_during, and type.

  4. codex.listings: Manages listing-related data with fields like listing_uid, primary_instrument_type_code, secondary_instrument_type_code, asset_class, etc.

  5. codex.currencies: Contains currency-related data referenced by the local_currency field.

  6. onyx_api.rollup_dates: Used for date rollups and is joined on as_of_date.

  7. codex.am_portfolio_history_wide: Linked through am_portfolio_uid to represent historical portfolio data.

  8. onyx_api.portfolios: Holds portfolio metadata like portfolio_type.

  9. crd_position_rollups: This is referenced for index creation at the end of the query.

These tables are joined and aggregated to produce financial metrics such as mkt_val_sod, acc_int_tot_qty, and crd_emv.

lusterchris commented 1 month ago

Here's a refactored version of your query that includes the following improvements for better performance:

Avoid unnecessary subqueries: Simplified nested subqueries into a single step.

Use common table expressions (CTEs): This will help break the query into more readable chunks and optimize reuse of calculations.

Remove redundant operations: Checked and removed any repetitive operations like the case checks inside JOIN and SELECT clauses.

Group operations earlier: Perform aggregations as early as possible in the query to reduce the number of rows processed.

Optimized Query

WITH portfolio_accruals_cte AS ( SELECT accr.as_of_date, accr.portfolio_uid, COALESCE(accr.listing_uid, -iso_uid) AS listing_uid, accr.accrual_type_code, accr.accruals_usd, accr.reclaims_usd, accr.spendable_accruals_usd, (accr.accruals_usd + accr.reclaims_usd) AS acc_int_tot_qty FROM codex.portfolio_accruals accr ), positions_cte AS ( SELECT pos.as_of_date, pos.portfolio_uid, pos.listing_uid, pos.value_usd, pos.local_currency, pos.n_shares FROM codex.am_positions pos ), combined_data AS ( SELECT COALESCE(pa.as_of_date, pc.as_of_date) AS as_of_date, COALESCE(pa.portfolio_uid, pc.portfolio_uid) AS portfolio_uid, COALESCE(pa.listing_uid, pc.listing_uid) AS listing_uid, COALESCE(pc.value_usd, 0) AS mkt_val_sod, COALESCE(pa.acc_int_tot_qty, 0) AS acc_int_tot_qty, (COALESCE(pc.value_usd, 0) + COALESCE(pa.accruals_usd, 0) + COALESCE(pa.reclaims_usd, 0)) AS crd_emv FROM portfolio_accruals_cte pa FULL JOIN positions_cte pc ON pa.as_of_date = pc.as_of_date AND pa.portfolio_uid = pc.portfolio_uid AND pa.listing_uid = pc.listing_uid ) SELECT df4.as_of_date, df4.portfolio_uid, df4.listing_uid, SUM(df4.mkt_val_sod) AS mkt_val_sod, SUM(df4.acc_int_tot_qty) AS acc_int_tot_qty, SUM(df4.crd_emv) AS crd_emv FROM combined_data df4 INNER JOIN onyx_api.rollup_dates cal ON df4.as_of_date = cal.as_of_date WHERE cal.as_of_date > '2024-06-30' GROUP BY df4.as_of_date, df4.portfolio_uid, df4.listing_uid ORDER BY df4.as_of_date, df4.portfolio_uid, df4.listing_uid;

Key Improvements:

  1. CTEs (WITH clause): This allows you to precompute the portfolio_accruals and positions data without having to nest complex subqueries.

  2. Combined data with FULL JOIN: This brings together data from the positions and portfolio_accruals CTEs early on.

  3. Aggregations: The aggregation happens at the final step using SUM and GROUP BY, reducing the number of rows processed at each stage.

Additional Index Suggestions:

Ensure that your indexes are optimized for this query structure:

-- For faster joins and filtering CREATE INDEX IF NOT EXISTS portfolio_accruals_as_of_date_portfolio_uid_listing_uid_idx ON codex.portfolio_accruals (as_of_date, portfolio_uid, listing_uid);

CREATE INDEX IF NOT EXISTS am_positions_as_of_date_portfolio_uid_listing_uid_idx ON codex.am_positions (as_of_date, portfolio_uid, listing_uid);

CREATE INDEX IF NOT EXISTS rollup_dates_as_of_date_idx ON onyx_api.rollup_dates (as_of_date);

This should reduce the query runtime significantly, making it more efficient.

lusterchris commented 1 month ago

onyx=> \d codex.am_positions View "codex.am_positions Column

as_of_date portfolio_uid listing_uid local_currency n_shares value_usd spendableaccruals usd accruals_usd variation_margin_usd

View "codex.am_positions

Definition

" SELECT as_of_date portfolio_uid, listing_uid. local_currency n_shares, value usd spendable_accruals_usd accruals_usd, variation_margin_usd FROM am_position_revisions WHERE (archive_timestamp IS NULL);

onyx=> \d codex.portfolio_accruals View "codex.portfolio_accruals

Column

as_of_date portfolio_uid base_currency listing_uid local_currency listing_source accrual_type_code accruals reclaims spendable_accruals accruals_base reclaims base spendable_accruals_base accruals_usd reclaims_usd spendable_accruals usd

View "codex.portfolio_accruals" definition

" SELECT as of_date portfolio_uid, base_currency listing_uid, local_currency listing_source. accrual_type_code accruals, reclaims spendable_accruals accruals_base, reclaims_base spendable_accruals base accruals_usd reclaims_usd spendable_accruals_usd FROM portfolio_accrual revisions WHERE (archive_timestamp IS NULL);

onyx=> ld onyx_api.account_breakdowns Materialized view "onyx_api.account_breakdowns Column

aa_portfolio_uid ax_portfolio_uid effective_during type portfolio_uid name income_spendable id_portfolio_uid am_portfolio_uid Indexes: account_breakdowns_aauid_eff_iduid_idx" btree (aa portfolio_uid, effective_during, id _portfolio_uid) INCLUDE (portf olio_uid, name) "account_breakdowns axuid_eff_iduid_idx" btree (ax_portfolio_uid, effective_during, id_p account_breakdowns_aauid_eff_iduid_idx" btree (aa_portfolio_uid, effective_during, id_p account_breakdowns_axuid_eff_iduid_idx" btree (ax_portfolio_uid, effective_during, id_portfolio_uid) INCLUDE (port account_breakdowns_axuid_eff_iduid_idx" btree (ax_portfolio_uid, effective_during, id_portfolio_uid) INCLUD "account_breakdowns_axuid_eff_iduid_idx" btree (ax_portfolio_uid, effective_during, id_portfolio_uid) INCLUDE (po rtfolio_uid, name) "account_breakdowns_puid_eff_idx" btree (portfolio_uid, effective_during) INCLUDE (am_portfolio_uid) unq" UNIQUE, btree account breakdowns (portfolio_uid type, am_portfolio_uid, effective_during) INCLUDE (portfol io_uid, name)

forpues lo Onyx and rollupdates Materialized View onyx apt roll updates as of date " rollup dates und" UNIQUE, bt ee (as of date)

View "codex.listings" definition

" SELECT l.listing_uid 1.name

l.asset_category_code cat.name AS asset_category l.asset_class l.sedol l.cusip, l.primary_listing_sedol l.cg_symbol l.bloomberg_id l.axioma_id l.xf_gvkey l.xf_iid l.xf_company_id l.xf_security_id l.xf_trading_item_id l.primary_instrument_type_code l.secondary_instrument_type_code l.tertiary_instrument_type_code l.issuer uid l.issuer_name 1.issuer_short_name 1.cg_country l.cg_country_of_risk l.cg_em indicator FROM (listing_revisions JOIN asset_categories cat ON ((l.asset_category_code = cat.code))) WHERE (l.archive_timestamp IS NULL);

View "codex.currencies

definition

" SELECT iso_code currencyuid listing uid currency_name FROM currency_revisions WHERE (archive_timestamp IS NULL);

View "codex.am_portfolio_history_wide definition SELECT am.portfolio uid am.name,

mr.asset_class am.is_active am.results_uid am.effective_during am.aaportfolio uid aa.account_number AS aa_number aa.abbreviation AS aa abbreviation aa.base_currency AS aa_base_currency exchange _rate source AS aa exchange rate source

aa. aa.type AS aa_type aa.name AS aa_name aa.objective AS aa_objective aa. .

region AS aa region aa .reference_portfoliol uid AS aa reference portfolio uid, aa.reference_portfolio_type AS aa_reference_portfolio_type aa.results uid AS aa _results uid am.mr_portfolio uid mi.initials AS mr_initials mi.name AS mr_name mr.number AS mr_number mr.role AS mr_role mr.type AS mr_type mr.idm_portfolio_uid, id.abbreviation AS id_abbreviation, id.name AS id_name am.mx_portfolio uid mx.number AS mx_number mx.objective AS mx_objective mx.results_uid AS mx_results_uid, am.rp_portfolio_uid rp.number AS rp_number rp.objective AS rpobjective rp.results uid AS rp_result tS uid, am.sg_portfolio_uid sg.number AS sg_number sg.objective AS sg_objective sg.results_uid AS sg_results_uid FROM (((((((am_portfolio_history am JOIN account_revisions aa ON (((am.aa_portfolio_uid = aa.portfolio_uid) AND (aa.archive_timestamp IS NULL)))) JOIN management_responsibility_revisions mr ON (((am.mr_portfolio_uid = mr.portfolio_uid) AND (mr.archive_timestamp IS NULL)))) JOIN manager_initials_revisions mi ON (((mr.initials_uid = mi.initials_uid) AND (mi.archive_timestamp IS NULL)))) EFT JOIN investment _division_revisions id ON (((am.id_portfolio_uid - id.portfolio_uid) AND (id.archive_timestamp IS NULL))) LEFT JOIN management_responsibility_rollup_revisions mx ON (((am.mx_portfolio_uid = nx.portfolio_uid) AND (mx.archive_timestamp IS NULL)))) LEFT JOIN research_portfolio_revisions rp ON (((am.rp_portfolio_uid = rp.portfolio_uid) AND (rp.archive_timestamp IS NULL)))) LEFT JOIN supergroup_revisions sg ON (((am.sg_portfolio_uid (sg.archive_timestamp IS NULL)))); sg.portfolio_uid) AND

onyx=> \d onyx_api.portfolios Materialized view "onyx_api.portfolios' Column

portfolio_uid portfolio_name portfolio_type results_uid Indexes: portfolio_name, results_uid) portfolios_puid_unq" UNIQUE

HAMPS

btree (portfolio_uid) INCLUDE (portfolio_type