hyperledger / aries-askar

Secure storage designed for Hyperledger Aries agents.
Apache License 2.0
58 stars 44 forks source link

Refactor Fetch SQL queries in Postgres backend #196

Closed rblaine95 closed 9 months ago

rblaine95 commented 10 months ago

An attempt at some optimizations on the FETCH_QUERY and FETCH_QUERY_UPDATE Postgres queries.

Opening as a draft PR while I work on figuring out a way to benchmark the updated queries.

According to GPT:

The refactored versions with the LEFT JOIN and GROUP BY clauses, as well as the use of CTEs are likely to be more efficient in terms of performance, especially when dealing with large datasets. It allows for a more optimized approach to data retrieval, particularly when combining information from multiple tables.

Please be as pedantic as possible (-Wclippy::pedantic -Wclippy::nursery).

Related to #195

swcurran commented 10 months ago

@andrewwhitehead — any idea why the tests failed? Doesn’t look like it would be related to the change?

Fun PR — thanks!

andrewwhitehead commented 10 months ago

It appears that the latest ring dependency breaks support for the manylinux aarch64 environment we are using (https://github.com/briansmith/ring/issues/1728). We could update to manylinux_2_28 or try one of the other fixes, I'll have to test that in another PR.

rblaine95 commented 9 months ago

Small update - I've made some progress on building a benchmark using Criterion to test if there's any benefit in the refactored SQL queries.

Preliminary results are looking good. On my local (M2 Macbook Pro (10CPU, 16GiB), bash ./tests/docker_pg.sh), I pre-populated the DB with 5000 profiles, then benchmarked the old SQL vs the new SQL creating 1000 profiles on top of the existing 5000.

Red = Old SQL Blue = Refactored SQL image

I'll work on a larger benchmark - pre-populate with 50000 profiles and then compare the SQL queries. The expectation is that the refactored version should be faster compared to the old version as the number of profiles increases.

Correction: It looks like I severely misunderstood. The benchmark, currently, isn't running in multi-tenant mode. Apologies for that.

rblaine95 commented 9 months ago

@swcurran, @andrewwhitehead, is there a channel on the Hyperledger Discord we could chat in? I realized I may have been a little overly excited with my benchmark results earlier and that the tests/benchmarks aren't actually running in Multitenant mode. I'd like to figure out a way to pre-populate the DB with a number of profiles so that the benchmarking results are a little more appropriate.

swcurran commented 9 months ago

There is an aries-askar channel on the Hyperledger Discord Server (invite: https://discord.gg/hyperledger) that I just created that we can use. Should have been created long ago :-).