openwallet-foundation / askar

Secure storage designed for Hyperledger Aries agents.
Apache License 2.0
63 stars 51 forks source link

Inefficient SQL fetch query #195

Closed ff137 closed 1 year ago

ff137 commented 1 year ago

Good day,

The following queries in aries-askar/askar-storage/src/backend/postgres/mod.rs appear to be source of performance degradation when a lot of wallet entries exist:

const FETCH_QUERY: &str = "SELECT id, value,
    (SELECT ARRAY_TO_STRING(ARRAY_AGG(it.plaintext || ':'
        || ENCODE(it.name, 'hex') || ':' || ENCODE(it.value, 'hex')), ',')
        FROM items_tags it WHERE it.item_id = i.id) tags
    FROM items i
    WHERE profile_id = $1 AND kind = $2 AND category = $3 AND name = $4
    AND (expiry IS NULL OR expiry > CURRENT_TIMESTAMP)";
const FETCH_QUERY_UPDATE: &str = "SELECT id, value,
    (SELECT ARRAY_TO_STRING(ARRAY_AGG(it.plaintext || ':'
        || ENCODE(it.name, 'hex') || ':' || ENCODE(it.value, 'hex')), ',')
        FROM items_tags it WHERE it.item_id = i.id) tags
    FROM items i
    WHERE profile_id = $1 AND kind = $2 AND category = $3 AND name = $4
    AND (expiry IS NULL OR expiry > CURRENT_TIMESTAMP) FOR NO KEY UPDATE";

The reason it is inefficient is coming from the subquery in the SELECT clause: SELECT ARRAY_TO_STRING(ARRAY_AGG(...

This subquery is executed for each row returned by the outer query. If the outer query returns a large number of rows, this can result in a significant performance hit.

Also, the subquery involves aggregating data and performing string manipulations, which are computationally expensive operations.

Now, I don't have an immediate solution to propose, but it will be something along the lines of replacing the subquery with a JOIN, and avoiding string manipulation/aggregation if possible.

Ideally the person who is most familiar with the SQL statements should be tasked with improving it, but in case of their absence, GPT-4 is your friend :-). Hopefully someone can propose a more efficient solution! Otherwise I'll try rehash my SQL skills when I have time available.


For anyone who wants to follow up, here's a link to my chat with GPT where the query is explained with recommendations: https://chat.openai.com/share/c9f791b0-a5e4-4457-beb1-9469a7d96099

Relates to: #130

rblaine95 commented 1 year ago

I'm a colleague of @ff137.

Our Database is an AWS RDS Postgres and we noticed very high DB CPU usage when creating a large number of holders: image image

Holder creation progressively gets slower and slower as the number of holders increases.

This is with

ACAPY_WALLET_TYPE: askar
ACAPY_WALLET_STORAGE_TYPE: postgres_storage
ACAPY_MULTITENANCY_CONFIGURATION: '{"wallet_type":"askar-profile","wallet_name":"cloudagent_multitenant"}'

image

swcurran commented 1 year ago

I wonder if the ChaptGPT answer actually works. Sinc @ff137 added this as a comment and it wasn’t in the initial email about the issue, here is what ChatGPT suggests:

For anyone who wants to follow up, here's a link to my chat with GPT where the query is explained with recommendations: https://chat.openai.com/share/c9f791b0-a5e4-4457-beb1-9469a7d96099

I’m guessing it is not easy to reproduce the scenario so that the proposal or other iterations can be tested? For example, would it be possible to capture the database, and run queries directly against it? Even if only you can do it, it might be easier to collaborate by running updated queries directly against the database.

ff137 commented 1 year ago

@swcurran thanks, good idea! My mental bandwidth is on a different project atm, but me and @rblaine95 will test some updated queries and see if we can contribute an enhancement 👌

rblaine95 commented 1 year ago

Just realized we forgot to add the versions we're using:

@swcurran If I can figure out how to replicate this (answer likely lies in Aries CloudAgent with how it creates tenants), I'll be happy to test 😄 Will also be excited to dust off my Rust and get my feet wet. 🥳

rblaine95 commented 1 year ago

A quick update @swcurran

I've made some changes to the SQL on my local and ran POSTGRES_URL=postgres://postgres:mysecretpassword@localhost:5432/test-db cargo test --features pg_test, all the tests are passing 🥳

I'll open a draft PR soon with the changes, it's just to the above SQL queries. We can figure out how to replicate/test the performance impact next.


While I was at that, I investigated using AWS RDS Proxy and it turned out that it completely mitigated the performance problem!

Running AWS RDS Aurora on a db.r5.2xlarge (8vCPU, 64GiB memory) with ±40'000 holders already in the DB, we could get ±25 holder creations per second with DB CPU pinned at 99% (screenshots above). Adding RDS Proxy allows us to scale the DB down to a db.r5.large (2vCPU, 16GiB memory) with 150'000 holders already in the DB, holder creation skyrockets (and sustains) to ±65/s.

image image

Setting synchronous_commit: off brings holder creation to ±80/s image

swcurran commented 1 year ago

Awesome stuff! Nice work. We look forward to your PR. A documentation update — perhaps just a note in the readme — about what you found with AWS might help others.

@WadeBarnes — interesting stuff here.

rblaine95 commented 1 year ago

Hi @swcurran I've opened a draft PR (#196) with the SQL changes.

I'll add some documentation about RDS Proxy in a separate PR.

rblaine95 commented 1 year ago

I feel an absolute fool. It turns out that, the SQL Query causing high load in the screenshots above (SELECT id, name, value, ...) was the SCAN_QUERY that seems to have been reworked and no longer occurs in 0.3.0.

I've closed #196.

I'm still happy to write a benchmark to illustrate and document any benefits of RDS Proxy.

swcurran commented 1 year ago

Awesome to have the benchmark and document added to the repo.