Closed ff137 closed 1 year ago
I'm a colleague of @ff137.
Our Database is an AWS RDS Postgres and we noticed very high DB CPU usage when creating a large number of holders:
Holder creation progressively gets slower and slower as the number of holders increases.
This is with
ACAPY_WALLET_TYPE: askar
ACAPY_WALLET_STORAGE_TYPE: postgres_storage
ACAPY_MULTITENANCY_CONFIGURATION: '{"wallet_type":"askar-profile","wallet_name":"cloudagent_multitenant"}'
I wonder if the ChaptGPT answer actually works. Sinc @ff137 added this as a comment and it wasn’t in the initial email about the issue, here is what ChatGPT suggests:
For anyone who wants to follow up, here's a link to my chat with GPT where the query is explained with recommendations: https://chat.openai.com/share/c9f791b0-a5e4-4457-beb1-9469a7d96099
I’m guessing it is not easy to reproduce the scenario so that the proposal or other iterations can be tested? For example, would it be possible to capture the database, and run queries directly against it? Even if only you can do it, it might be easier to collaborate by running updated queries directly against the database.
@swcurran thanks, good idea! My mental bandwidth is on a different project atm, but me and @rblaine95 will test some updated queries and see if we can contribute an enhancement 👌
Just realized we forgot to add the versions we're using:
@swcurran If I can figure out how to replicate this (answer likely lies in Aries CloudAgent with how it creates tenants), I'll be happy to test 😄 Will also be excited to dust off my Rust and get my feet wet. 🥳
A quick update @swcurran
I've made some changes to the SQL on my local and ran POSTGRES_URL=postgres://postgres:mysecretpassword@localhost:5432/test-db cargo test --features pg_test
, all the tests are passing 🥳
I'll open a draft PR soon with the changes, it's just to the above SQL queries. We can figure out how to replicate/test the performance impact next.
While I was at that, I investigated using AWS RDS Proxy and it turned out that it completely mitigated the performance problem!
Running AWS RDS Aurora on a db.r5.2xlarge
(8vCPU, 64GiB memory) with ±40'000 holders already in the DB, we could get ±25 holder creations per second with DB CPU pinned at 99% (screenshots above).
Adding RDS Proxy allows us to scale the DB down to a db.r5.large
(2vCPU, 16GiB memory) with 150'000 holders already in the DB, holder creation skyrockets (and sustains) to ±65/s.
Setting synchronous_commit: off
brings holder creation to ±80/s
Awesome stuff! Nice work. We look forward to your PR. A documentation update — perhaps just a note in the readme — about what you found with AWS might help others.
@WadeBarnes — interesting stuff here.
Hi @swcurran I've opened a draft PR (#196) with the SQL changes.
I'll add some documentation about RDS Proxy in a separate PR.
I feel an absolute fool.
It turns out that, the SQL Query causing high load in the screenshots above (SELECT id, name, value, ...
) was the SCAN_QUERY
that seems to have been reworked and no longer occurs in 0.3.0
.
I've closed #196.
I'm still happy to write a benchmark to illustrate and document any benefits of RDS Proxy.
Awesome to have the benchmark and document added to the repo.
Good day,
The following queries in aries-askar/askar-storage/src/backend/postgres/mod.rs appear to be source of performance degradation when a lot of wallet entries exist:
The reason it is inefficient is coming from the subquery in the SELECT clause:
SELECT ARRAY_TO_STRING(ARRAY_AGG(...
This subquery is executed for each row returned by the outer query. If the outer query returns a large number of rows, this can result in a significant performance hit.
Also, the subquery involves aggregating data and performing string manipulations, which are computationally expensive operations.
Now, I don't have an immediate solution to propose, but it will be something along the lines of replacing the subquery with a JOIN, and avoiding string manipulation/aggregation if possible.
Ideally the person who is most familiar with the SQL statements should be tasked with improving it, but in case of their absence, GPT-4 is your friend :-). Hopefully someone can propose a more efficient solution! Otherwise I'll try rehash my SQL skills when I have time available.
For anyone who wants to follow up, here's a link to my chat with GPT where the query is explained with recommendations: https://chat.openai.com/share/c9f791b0-a5e4-4457-beb1-9469a7d96099
Relates to: #130