lusterchris / Performance-Tuning

0 stars 0 forks source link

Index #1

Open lusterchris opened 3 months ago

lusterchris commented 3 months ago

In PostgreSQL, index analysis involves a detailed examination of how indexes are being utilized within the database to optimize query performance. Here's a breakdown of the technical aspects involved:

  1. Index Usage Analysis

    • Identify Unused Indexes: We examine the system catalog (pg_stat_user_indexes) to identify indexes that have low or zero usage. Unused indexes consume disk space and can negatively impact write performance without providing any benefit to query execution.
    • Evaluate Index Scans: By analyzing pg_stat_user_tables, we assess how frequently indexes are scanned (index scans vs. sequential scans). High sequential scan counts on large tables could indicate missing or inefficient indexes.
  2. Bloat and Fragmentation Analysis

    • Index Bloat: Over time, indexes can become bloated, leading to unnecessary disk space usage and slower performance. We use tools like pgstattuple or the pg_bloat_check extension to measure bloat levels and identify indexes that need to be reindexed.
    • Reindexing: Based on the bloat analysis, we perform REINDEX operations to rebuild bloated indexes, reducing their size and improving performance.
  3. Index Redundancy Check

    • Redundant Indexes: We look for overlapping or duplicate indexes that might have been created over time. This involves examining indexes with similar column sets but differing in order, uniqueness, or inclusion of additional columns. Redundant indexes are candidates for removal to save disk space and reduce maintenance overhead.
  4. Index Efficiency

    • Composite Index Analysis: We assess the effectiveness of composite indexes, ensuring that column order and inclusion align with query patterns. Inefficient composite indexes can lead to partial scans or full scans instead of index-only scans.
    • Partial Indexes: We consider creating partial indexes for cases where queries filter on specific conditions. Partial indexes can be smaller and faster, optimizing performance for particular query patterns.
  5. Index Scan Efficiency

    • Query Performance Review: We analyze query execution plans using EXPLAIN and EXPLAIN ANALYZE to see how indexes are being used. We focus on whether the planner is choosing index scans, index-only scans, or bitmap index scans and how those choices impact query performance.
    • Index-Only Scans: We review opportunities to enhance index-only scan usage, which avoids hitting the heap entirely and can significantly speed up read queries.
  6. Maintenance Planning

    • VACUUM and ANALYZE: Regular maintenance tasks like VACUUM and ANALYZE are crucial for keeping the indexes updated and ensuring the query planner has accurate statistics. We schedule and monitor these tasks to maintain database health.
    • Automatic Index Maintenance: We review and possibly adjust the autovacuum settings to ensure that index maintenance tasks are performed in a timely manner without impacting database performance.

By conducting this thorough index analysis, we aim to streamline database performance, reduce unnecessary overhead, and ensure that indexes are effectively supporting the query patterns in our PostgreSQL environment.

lusterchris commented 2 months ago

To determine if an index in PostgreSQL 16 on RDS is actually used and how often it has been used before deciding to rebuild it, you can perform the following checks and use the following commands:

1. Check Index Usage

2. Check Index Size

3. Check Bloat

4. Rebuild Index

5. Consideration of Index Usage Patterns

Summary:

  1. Check index usage with pg_stat_user_indexes.
  2. Check index size with pg_relation_size.
  3. Check for index bloat using pgstattuple.
  4. Rebuild index if necessary with REINDEX.

These steps should help you make an informed decision about index maintenance in PostgreSQL 16 on RDS.

lusterchris commented 2 months ago

To create a complex and detailed procedure for checking if database statistics need to be updated in PostgreSQL 16 on AWS RDS, you can follow the steps below. This will generate a lot of data and numbers that might be difficult for others to easily interpret.

Step 1: Check Table and Index Bloat

Bloat in tables and indexes can indicate that statistics may be outdated.

WITH bloat_info AS (
    SELECT
        current_database() AS dbname,
        schemaname,
        tablename,
        iname AS indexname,
        reltuples::bigint AS estimated_rows,
        relpages::bigint AS total_pages,
        relpages::bigint * 8 AS total_size_kb,
        (relpages::bigint * 8) - (pg_relation_size(c.oid) / 1024)::bigint AS bloat_kb,
        ROUND(((pg_relation_size(c.oid) / 1024::numeric) / (relpages::bigint * 8))::numeric, 2) AS bloat_ratio
    FROM pg_class c
    LEFT JOIN pg_stat_user_tables t ON c.oid = t.relid
    LEFT JOIN pg_stat_user_indexes i ON c.oid = i.indrelid
    WHERE c.relkind = 'r' AND relpages > 0
)
SELECT * FROM bloat_info
ORDER BY bloat_ratio DESC;

Step 2: Analyze VACUUM Statistics

Determine the frequency of vacuum operations to understand how often tables are being vacuumed, which impacts statistics.

SELECT
    schemaname,
    relname AS table_name,
    n_tup_ins AS rows_inserted,
    n_tup_upd AS rows_updated,
    n_tup_del AS rows_deleted,
    last_vacuum::timestamp AS last_manual_vacuum,
    last_autovacuum::timestamp AS last_auto_vacuum,
    last_analyze::timestamp AS last_manual_analyze,
    last_autoanalyze::timestamp AS last_auto_analyze,
    round((100 * n_dead_tup::numeric / (n_live_tup + n_dead_tup + 1)), 2) AS dead_tuples_percentage,
    n_live_tup + n_dead_tup AS total_rows
FROM pg_stat_user_tables
ORDER BY dead_tuples_percentage DESC;

Step 3: Compare Current Statistics with Latest Data

Compare the estimated row counts from statistics with the actual row counts to see if there's a large discrepancy.

SELECT
    schemaname,
    tablename,
    reltuples::bigint AS estimated_rows,
    COUNT(*) AS actual_rows,
    ROUND((COUNT(*) - reltuples::bigint)::numeric / GREATEST(reltuples::bigint, 1), 2) AS discrepancy_ratio
FROM pg_class c
JOIN pg_stat_user_tables t ON c.oid = t.relid
JOIN pg_stat_all_tables s ON c.oid = s.relid
GROUP BY schemaname, tablename, reltuples
ORDER BY discrepancy_ratio DESC;

Step 4: Deep Dive into Index Statistics

Check the distribution and usage of indexes to assess if they reflect the current data distribution.

SELECT
    c.relname AS table_name,
    i.relname AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    idx_tup_read::bigint AS index_reads,
    idx_tup_fetch::bigint AS index_fetches,
    idx_scan::bigint AS index_scans,
    idx_scan / GREATEST(idx_tup_read::numeric, 1) AS scan_to_read_ratio,
    idx_scan / GREATEST(n_tup_upd + n_tup_del, 1)::numeric AS scan_to_update_delete_ratio
FROM pg_stat_user_indexes i
JOIN pg_class c ON i.relid = c.oid
ORDER BY scan_to_read_ratio DESC;

Step 5: Histogram Analysis for Columns

Perform a histogram analysis on key columns to understand data distribution and identify skewed statistics.

SELECT
    attname AS column_name,
    n_distinct::bigint AS distinct_values,
    most_common_freqs,
    histogram_bounds
FROM pg_stats
WHERE schemaname = 'public'
AND tablename = 'your_table_name'
ORDER BY n_distinct DESC;

Step 6: Correlation Analysis on Columns

Analyze the correlation of indexed columns to see if the correlation is high, indicating that statistics might need updating.

SELECT
    attname AS column_name,
    correlation
FROM pg_stats
WHERE schemaname = 'public'
AND tablename = 'your_table_name'
ORDER BY correlation DESC;

Step 7: Query Execution Plan Analysis

Finally, check the query execution plans to see if they rely heavily on potentially outdated statistics.

EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM your_table_name
WHERE your_column_name = 'some_value';

This procedure will generate a lot of complex data, which can help you determine whether the statistics need to be updated, while also creating an output that is difficult for others to decipher easily.

lusterchris commented 2 months ago

Even with autovacuum enabled and set to run based on thresholds like autovacuum_vacuum_threshold and autovacuum_vacuum_naptime, scheduling a manual VACUUM can still be beneficial in certain situations. Here's why it might still be necessary to schedule a manual VACUUM:

1. Handling Large, Sudden Data Changes:

2. Ensuring Table and Index Health:

3. Fine-Tuning Performance:

4. Preventing Transaction ID Wraparound:

5. Improving Visibility and Control:

6. Handling Specific Workloads:

When to Schedule Manual VACUUM:

Conclusion:

While autovacuum is generally sufficient for routine maintenance, scheduling a daily manual VACUUM after hours can provide a safety net to ensure optimal performance, especially for tables with high churn, large bulk operations, or to handle specific cases of index maintenance and transaction ID wraparound prevention.

Frequent checks on the status of indexes in PostgreSQL and rebuilding them when necessary is crucial for maintaining database performance and ensuring efficient query execution. Here's why it’s important:

1. Preventing and Managing Index Bloat:

2. Improving Query Performance:

3. Reducing Disk Space Usage:

4. Ensuring Accurate Query Planning:

5. Avoiding Performance Degradation:

6. Handling Dead Tuples Efficiently:

7. Mitigating the Impact of High Churn Workloads:

Best Practices for Index Maintenance:

Conclusion:

Frequent checks and necessary rebuilds of indexes in PostgreSQL help to maintain efficient query performance, manage disk space usage, and ensure that the query planner makes optimal decisions. Regular index maintenance is a critical component of overall database health and performance tuning.