pgmoneta / pgmoneta_ext

PostgreSQL extension for pgmoneta
BSD 3-Clause "New" or "Revised" License
4 stars 3 forks source link

[#22] Get LSN #23

Open GuChad369 opened 2 months ago

GuChad369 commented 2 months ago

I have added the new function pgmoneta_ext_get_lsn() and tested it on PostgreSQL 15.

[postgres@fedora-linux-38 pgsql]$ psql -h localhost -p 5432 -U repl postgres 
Password for user repl: 
psql (15.1)
Type "help" for help.

postgres=# SELECT pgmoneta_ext_get_lsn();
 pgmoneta_ext_get_lsn 
----------------------
 0/8D072BF0
(1 row)

postgres=# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 0/8D072BF0
(1 row)

The result is the same when compared to the PostgreSQL function, so I believe it's working correctly.

The next step is to create a new test suite to test this in the container environment.

jesperpedersen commented 2 months ago

Yeah, looks correct.

You can go through https://www.crunchydata.com/blog/postgres-wal-files-and-sequuence-numbers to get more ideas

GuChad369 commented 2 months ago

I have updated the test suite and tested it. This function works on all five versions.

Jubilee101 commented 1 month ago

Sorry about the delay.. Not sure if this is what we want. Think about why we need that start lsn, we may need to redo everything that's not written to disk since backup starts when restoring that backup. And you are only getting the lastest lsn, which doesn't necessarilly mean that everything earlier than this has been written to disk.

Take a look at do_pg_backup_start() in postgres and see if we can recreate it... or borrow it directly. It should be easy if we can just borrow it. It's the privilege issue that worries me, since it involves triggering a check point, or at least wait for a regular check point to happen.

jesperpedersen commented 1 month ago

Try and look at pgbackrest too how they use the function.

Note,

src/db/db.c:        strZ(pgWalName(pgVersion)), pgVersion >= PG_VERSION_15 ? "backup_start" : "start_backup");
GuChad369 commented 1 month ago

And you are only getting the lastest lsn, which doesn't necessarilly mean that everything earlier than this has been written to disk. Is it possible to first force a checkpoint to ensure everything is written to disk? We have created the function pgmoneta_ext_checkpoint to achieve this.

Take a look at do_pg_backup_start() in postgres and see if we can recreate it... or borrow it directly. What is the purpose of recreating this function? Could you clarify the next steps? Thank you so much!

Jubilee101 commented 1 month ago

Is it possible to first force a checkpoint to ensure everything is written to disk? We have created the function pgmoneta_ext_checkpoint to achieve this.

If you take a look into pg_backup_start() that's essentially what the function does -- forcing or waiting for a checkpoint and get the redo point afterwards. What's bugging me is that normally "only superusers or users with the privileges of the pg_checkpoint role can call CHECKPOINT". So I'm not entirely sure if our user with "replication" role is allowed to do that. You need to investigate a bit on that.

What is the purpose of recreating this function? Could you clarify the next steps? Thank you so much!

You can either recreate the function or simply just borrow it in the extension API, whichever suits you. But either way make sure you understand the function internals. The purpose of the function is to get a start LSN for the backup. You see you don't stop the world while making a backup. But copying the entire data dir takes time, which means some transactions can come along during that time doing some changes on both the files you've already copied and the ones you haven't. So there's an internal inconsistency, some files reflect the changes of that transaction, while others don't.

But this is fine because we have WAL and the start LSN. So when restoring a backup, postgres always looks up the start LSN and starting from there, redo everything according to the log records up until the end LSN. So that the internal inconsistency is eliminated. That's why we keep saying that getting start/end LSN is crucial for making the backup. We are also interested in how pgbackrest does it, since unlike us it doesn't have access to postgres' internal functions.

jesperpedersen commented 1 month ago

The basic idea is basically the same as how PostgreSQL 17 does it.

1) Upload the latest manifest file to the extension 2) pg_backup_start() 3) pg_backup_stop() 4) Compare the 2 manifest files 5) Send the changes back to the client

You can use all the functions in https://www.postgresql.org/docs/13/functions-admin.html#FUNCTIONS-ADMIN-BACKUP to help you so an idea to be to go through all those functions.

One of the key points of this is that in order to force a CHECKPOINT the replication user need to have either SUPERUSER or pg_checkpoint in its roles. This need to be documented with pros and cons.

So, if the replication user doesn't have those we will need to wait for a checkpoint to happen - just like pgbackrest. Therefore there are two "modes" of this functionality.

Start by just doing a full backup - e.g. without the manifest files - using only the extention plus the checkpoint functionality in one of its forms

GuChad369 commented 1 month ago

So I'm not entirely sure if our user with "replication" role is allowed to do that.

Regarding this part, the extension does not impose any restrictions, allowing us to call internal functions without requiring superuser privileges. I tested this when creating the pgmoneta_ext_checkpoint function. Although the function usually requires the superuser or pg_checkpoint role to execute, our source code performs its own privilege checks. After removing all such checks and using our replication role to call the function, it still succeeded. This indicates that we can perform any action within the extension, and if we do not explicitly restrict privileges, there will be no limitations on what roles can execute the function. We need to handle the privilege checks in extension.

I have reviewed the code for the function and developed a basic understanding of it. I will continue exploring it further, but at the moment, everything still feels a bit broad.

Could you help me with a specific task— e.g. finding a way to send the manifest to the PostgreSQL server? This approach works better for me. I'm good at solving specific problems, but when it comes to large projects, I sometimes struggle to figure out where to start.

jesperpedersen commented 1 month ago

Start by calling pg_backup_start() and pg_backup_stop() and put the data into a struct.

Remember that older versions they were called pg_start_backup() / pg_stop_backup().

Maybe create GUCs that will allow a socket server to receive data - the manifest - and look into creating a protocol; maybe look at how https://www.postgresql.org/docs/17/protocol-replication.html does it

GuChad369 commented 1 month ago

I will first try to create two functions, pgmoneta_ext_start_backup and pgmoneta_ext_stop_backup so that we can utilize them on our core side.

Based on our previous discussions and my understanding of the pg_backup_start() and pg_backup_stop() processes, here’s a struct definition:

typedef struct BackupInfo
{
    char *backup_label;
    XLogRecPtr start_lsn;
    XLogRecPtr stop_lsn;
} BackupInfo;

If there is any additional information we need, please let me know.

Datum
pgmoneta_ext_start_backup(PG_FUNCTION_ARGS)
{
    // Declare a structure to store backup-related information
    BackupInfo *info; 

    // Calling the existing pg_backup_start function directly based on PostgreSQL version
    Datum result = DirectFunctionCall1(pg_backup_start, CStringGetDatum("backup_label"));

    // Populate the info structure with necessary backup details
    ......

    // Return the backup information
    PG_RETURN_LSN(info);
}

The above is my pseudocode for the function. Please feel free to provide any advice if it doesn't serve the intended purpose.

jesperpedersen commented 1 month ago

Sounds like you are heading in the right direction.

You can just make them static though, and just expose a single function pgmoneta_ext_full_backup() that uses those static functions. The key is to get the .tar generated, and sent back over the wire - I think the core replication protocol is a good way to do it