matter-labs / zksync-era

zkSync era
Apache License 2.0
3.08k stars 2.05k forks source link

[FR] Amazon Aurora PostgreSQL support (required changes provided) #1791

Open gaplo917 opened 3 months ago

gaplo917 commented 3 months ago

šŸŒŸ Feature Request

Support Amazon Aurora PostgreSQL (Standards and Serverless V2) on AWS.

šŸ“ Description

Amazon Aurora PostgreSQL is a fully managed, PostgreSQLā€“compatible, and ACIDā€“compliant relational database engine that combines the speed, reliability, and manageability of Amazon Aurora with the simplicity and cost-effectiveness of open-source databases. Aurora PostgreSQL is a drop-in replacement for PostgreSQL and makes it simple and cost-effective to set up, operate, and scale your new and existing PostgreSQL deployments, thus freeing you to focus on your business and applications. --- Working with Amazon Aurora PostgreSQL

Aurora PostgreSQL usually is a drop-in replacement for PostgreSQL but unfortunately not in this case. When I tried to initial the stack, it fails

aurora error

After deep dive into the source code, I found the following codes that check replication lag https://github.com/matter-labs/zksync-era/blob/main/core/lib/dal/src/system_dal.rs#L20-L39

Aurora PostgreSQL use underlying architecture that does not use WAL to do replication. It doesn't have the pg_last_wal_receive_lsn, pg_last_wal_replay_lsn , pg_last_xact_replay_timestamp pg functions. Thus, it throws SQLSTATE 0A000 feature_not_supported exception.

I think there are two potential options to support Aurora PostgreSQL and I could provide the PR once the community come into agreement.

Option 1.

Add a new feature flag and using conditional statement to switch the replication lag query

SELECT
        highest_lsn_rcvd = current_read_lsn as synced,
        ROUND(replica_lag_in_msec / 1000)::int   as lag
    FROM aurora_replica_status()
    ORDER BY replica_lag_in_msec DESC NULLS LAST
    LIMIT 1;

Option 2.

Automatically handle fallback case using pg function. Create a sql file to create the function automatcially before checking the replication lag,

CREATE OR REPLACE FUNCTION get_replication_lag_sec()
    RETURNS TABLE (synced BOOLEAN,lag INTEGER) AS $$
DECLARE
BEGIN
    RETURN QUERY SELECT
        pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() as synced,
        EXTRACT(SECONDS FROM now() - pg_last_xact_replay_timestamp())::int AS lag;
EXCEPTION WHEN SQLSTATE '0A000' THEN
    -- SQLSTATE 0A000 feature_not_supported
    -- SQLERRM: Function pg_last_xlog_receive_location() is currently not supported for Aurora
    RETURN QUERY SELECT
        highest_lsn_rcvd = current_read_lsn as synced,
        ROUND(replica_lag_in_msec / 1000)::int   as lag
    FROM aurora_replica_status()
    ORDER BY replica_lag_in_msec DESC NULLS LAST
    LIMIT 1;
END;
$$ LANGUAGE plpgsql;

modify the codes https://github.com/matter-labs/zksync-era/blob/main/core/lib/dal/src/system_dal.rs#L28-L34 to

select * from get_replication_lag_sec();

šŸ¤” Rationale

Aurora PostgreSQL provides serverless infrastructure that able to near-instant autoscale vertically on compute, memory, and IOPS.

EmilLuta commented 3 months ago

Hey @gaplo917. Thanks for the issue submission. At the moment, we mostly run in GCP (we're agnostic on most areas, but there are a few ties to GCP). That said, we want to move the codebase in a modular place where you could have your own implementation of of each component (in this case, you could write your own snapshotter from scratch and just inject it at compile time).

For the moment, it's unlikely that we'll support other infrastructure, but we will open the capabilities for anyone to build on top in the near future.

hiteshdamke commented 2 months ago

thankss

gaplo917 commented 2 months ago

@EmilLuta Thank you for the prompt reply. The modular structure sounds promising and I can imagine that it's a huge chunk of workload that will take time.

Would you mind if I leave this FR open, so if others encounter the same issue, they are able to find it in the issues list?

EmilLuta commented 2 months ago

I believe it's discoverable even as closed, but let's keep it open as it's not been addressed. Thanks @gaplo917!

trader2021 commented 2 months ago

good nice work

tradeupo commented 2 months ago

lfg

zhenweisi commented 1 month ago

good nice work