quay / claircore

foundation modules for scanning container packages and reporting vulnerabilities
https://quay.github.io/claircore/
Apache License 2.0
142 stars 85 forks source link

libvuln POSTGRESQL: Requesting bulk query and few performance tips for postgresDB #995

Open vishnuchalla opened 1 year ago

vishnuchalla commented 1 year ago

Description

We have changed one of the most time taking insert query into a bulk insert query and noticed positive difference in the DB performance.

POC PR: https://github.com/quay/claircore/pull/996

DB Metrics And Observations

Before this change

Dashboard link here just in case wanted to look at other metrics as well in a sample snapshot. Disk_IOPS Clair_DB_CPU Clair_App_CPU

After this change

Dashboard link here just in case wanted to look at other metrics as well in a sample snapshot.

Clair_DB_Disk_IOPS_A Clair_DB_CPU_Trend_A Clair_App_CPU_A

If we look at the spikes after this bulk insert changes they are very narrowed and even the IOPS got decreased by 1000s and also lead to better performance on the Clair App side.(i.e reducing the IO wait time of the requests). This will ultimately help us reduce cost from both Clair DB and App perspective.

Profiling Report And Observations

Before this change. Complete report here

General_Activity Average_Query_Execution_Time Queries_Histogram

After this change. Complete report here

General_Activity_A Average_Query_Execution_Time_A Queries_Histogram_A

If we look at the average query execution times, there is a 100 fold difference after moving to a bulk insert strategy (i.e. 250 ms to 2.5 ms). And also the time taken for running overall queries got reduced along with the number of queries running on the DB ultimately resulting in less IOPS on DB. (This will keep DB less busy allowing more incoming requests to be served and also will help us reduce those spikes seen in both Clair App and DB)

Summary And Conclusion

OR

    def approval_query_executor(self, table_id, table_name, columns, approval_type):

        """
        executes approval query for cntl_approval_info in rds

        :param table_id: table id according to cntl_table_info in rds
        :param table_name: table name according to cntl_table_info in rds
        :param columns: columns to be approved
        :param approval_type: type of approval like add, update and delete
        :return: None
        """

        executer = QueryExecuter(self.default_args['rds_host'], self.default_args['rds_port'],
                                 self.default_args['rds_dbName'], self.default_args['rds_password'],
                                 self.default_args['rds_username'])
        approval_info = self.default_args['approval_info']
        cntl_columns_approval_query = """INSERT INTO %s (table_id, src_table_name, src_column_name, column_datatype,
         approval_type, approval_flag, last_updated_date, approved_by, active_flag, status, data_length, data_precision,
          data_scale) VALUES """
        params = (approval_info,)
        if len(columns) > 0:
            for column_number in range(0, len(columns)):
                cntl_columns_approval_query += """(%s, '%s', '%s', '%s', '%s', %s, '%s', '%s', %s, '%s', %s, %s, %s),"""
                params += (table_id, table_name, columns[column_number][1], columns[column_number][2], approval_type,
                           0, str(datetime.now()), 'Dejavu-Gatekeeper', 1, 'pending approval for approval in rds',
                           columns[column_number][3], columns[column_number][4], columns[column_number][5])
            cntl_columns_approval_query = cntl_columns_approval_query[:-1] + ";"
            executer.execute_query(cntl_columns_approval_query, params, approval_info)
        else:
            logging.info("No columns to send for approval to {0} in {1}".format(approval_type, table_name))

    def update_sequence_number(self, new_columns, table_name, target=None):

        """
        updates sequence number of a table in cntl_column_info in rds

        :param new_columns: list of new columns
        :param table_name: table name
        :param target: flag to determine update in source or target sequence number
        :return: None
        """

        executer = QueryExecuter(self.default_args['rds_host'], self.default_args['rds_port'],
                                 self.default_args['rds_dbName'], self.default_args['rds_password'],
                                 self.default_args['rds_username'])
        column_info = self.default_args['column_info']
        columns = list()
        new_columns = tuple(new_columns)
        seq_fashion = 'src_seq_no' if target is None else 'trgt_seq_no'
        if len(new_columns) != 0:
            update_sequence_number_query = """UPDATE %s SET %s = CASE src_column_name """
            params = (column_info, seq_fashion)
            for index in range(0, len(new_columns)):
                update_sequence_number_query += """WHEN '%s' THEN %s """
                params += (new_columns[index][1], index + 1)
                columns.append(new_columns[index][1])
            update_sequence_number_query += """END WHERE src_column_name IN """ + str(
                columns).replace('[', '(').replace(']', ')') + """ AND src_table_name='%s' AND src_active_flag=%s
                 AND active_flag=%s;"""
            params += (table_name, 1, 1)
            executer.execute_query(update_sequence_number_query, params, column_info)
hdonnay commented 1 year ago

Why is the max and average duration so much worse? I don't understand why the graphs don't seem to correlate at all to the numbers in the table. It looks like the execution time is worse in the giant-query version.

vishnuchalla commented 1 year ago

Why is the max and average duration so much worse? I don't understand why the graphs don't seem to correlate at all to the numbers in the table. It looks like the execution time is worse in the giant-query version.

  • Yes there are some queries that ran for long time (bulk ones) which are there in the top twenty list. But the average time in overall is very less when we did gaint query execution. If we look at the average times for all the inserts before and after this change it got reduced from 250 ms to 2.5 ms (In the averge query duration screenshot). The ones which are being listed in the top 20 list as time consuming are just outliers. And also if those queries in overall were running so long, we would have seen IOPS spikes in DB metrics for so long which is not happening in our case (instead it got reduced from 3000 IOPS to 2000 IOPS on an average).