pgmoneta / pgmoneta_ext

PostgreSQL extension for pgmoneta
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Incremental backup #22

Open GuChad369 opened 3 weeks ago

GuChad369 commented 3 weeks ago

I'm starting to research and attempt to implement incremental backup. I plan to achieve this feature through three PRs:

  1. Find a way to send the manifest files to the extension.
  2. Compare the files with the server's PostgreSQL instance.
  3. Handle the result.
GuChad369 commented 3 weeks ago

@Jubilee101 If I have any questions during the process, I will post them here along with any discussions.

Jubilee101 commented 3 weeks ago

Sounds good! Just let us know if you have ideas or problems.

GuChad369 commented 2 weeks ago

I have researched pgbackrest, which also supports incremental backups. The general idea behind its implementation is similar, using manifests to achieve this functionality.

In pgbackrest, there are two main files that handle communication between the client and server: src/protocol/client.c and src/protocol/server.c. This C code implements a client-server communication protocol.

The implementation follows strict state management and error-handling practices, ensuring reliable communication between the client and server. It also supports asynchronous communication, allowing for complex multi-session interactions.

Additionally, there are two packed interfaces, IoRead and IoWrite, used for reading from and writing to the communication channel.

In summary, pgbackrest implements its communication protocol between the client and server, enabling them to send and receive protocol messages effectively.

I will try to become familiar with our project's communication methods and explore using a protocol directly to send the manifest file, avoiding the need to create a table on the server side.

Jubilee101 commented 2 weeks ago

In summary, pgbackrest implements its communication protocol between the client and server, enabling them to send and receive protocol messages effectively.

Hmm, are they using this protocol to send manifest and receive backups? This is a bit weird since I don't recall having to install pgbackrest on both sides.

I don't think it's appropriate to setup connections on the extension side, it seems like a weird hacking of postgres extensions. Some other options would be A. we send manifest entry one query at a time, B. the extension sends manifest as a query response, one row per entry, and we do comparison on main.

There is a chance that we don't need manifest to know what changed, I'm looking at how postgres does it, and it seems that they use WAL purely. The idea is to get a WAL summary from last backup's start LSN to current backup's start LSN, files mentioned in WAL in the range are naturally subject to incremental backup. And here's a nice thing, we are going to have our WAL support within this GSoC. And we stream WAL segments to main all the time. So the whole "comparison"(it's not exactly a comparison now) could happen entirely on main side.

Either way, we need to know what the start LSN (and end LSN as well, but one thing at a time) is for current backup. And it is not easy, since I think postgres gets it by inducing a checkpoint, which I'm not sure if we have the authority to do. I think you can put a pin on sending manifest while Jesper and I think about the best way to address this, and look into getting the start LSN on the extension side. I think what pgbackrest does is to wait until postgres does its next regular checkpoint. But it seems that it also has an option to start immediately(--start-fast). I'm interested to know what they are doing underneath.

We are still in exploring phase of a hard problem so I think it is OK that things are slow -- postgres itself waits until v17 🤷‍♂️ . We can also get support for v17 going at the same time. Anyway, let us if you have better ideas or problems, as always.

jesperpedersen commented 2 weeks ago

I think we should think about

struct file_info
{
   char[] filename;
   char[] checksum;
   time_t timestamp;
}

in a deque (memory cached). That can be refreshed, and in most cases timestamp wouldn't have changed.

We should keep the last backup information cached such that is it easier to make a "diff"...

However, like @Jubilee101 pointed out, LSN is the important factor, so that is the driver. The above is an optimization.

I like B.

GuChad369 commented 2 weeks ago

For B, we could create a function similar to pgmoneta_ext_get_manifest within our extension. This function would return the latest manifest. We would then call this function on the main server to retrieve the result and proceed with the comparison. Do you think this approach sounds good to you?

The concern is that if the result is too large, we might encounter issues with this approach. It's uncertain whether it would still work effectively in such cases.

jesperpedersen commented 2 weeks ago

We should start with the file_info deque, and then prepare a JSON format similar to backup_manifest.

This is a very big task, so we need to break it down - while we consider very large database clusters

jesperpedersen commented 2 weeks ago

Talk to @Jubilee101 to port the core data structures over such that building the JSON document is easy

GuChad369 commented 2 weeks ago

So, each time we perform a backup, we can read the backup.manifest file and store the information in a file_info deque.

jesperpedersen commented 2 weeks ago

Work on getting a GUC in place such that checksums are calculated based on the setting

jesperpedersen commented 2 weeks ago

We can have "latest backup", and current checksums... we really need to know what has changed since the last backup

jesperpedersen commented 2 weeks ago

But, core/ does the backup, so you need a way to receive the "official" manifest and keep it

jesperpedersen commented 2 weeks ago

The core idea is that we can't afford to wait to calculate checksums on a PB-scale database cluster, so we need to keep the checksum current in memory.

Most of the files won't change, so that is "guarded" by the file timestamp check, so each iteration should be "minimal"

Jubilee101 commented 2 weeks ago

WAL summary sounds promising but it needs some more time. So for now maybe focus on porting the data structures(should be easy copy & paste), and getting the start/end LSN. As for the checksum cache, it's an optimization which is nice to have. So look at it if you have time, but it doesn't sound like top priority to me.

GuChad369 commented 2 weeks ago

Since the school year has started, I might not always be able to respond promptly. However, I'll do my best to stay on top of the project, and I'll do my best to complete it.

I'm a little confused right now because I don't have a rough outline of the entire process. Could you clarify where we should define the data structures and in which file?

I did some research on retrieving the start/end LSN and found an internal PostgreSQL function, GetXLogWriteRecPtr(), which retrieves the current write pointer of the WAL (Write-Ahead Log). This function returns the LSN as an XLogRecPtr type, which is a data structure used in PostgreSQL to represent an LSN. Essentially, it's a 64-bit integer.

I think this might be the LSN we need if we use it before the backup. If we define a custom function, pgmoneta_ext_get_lsn(), on our extension side, it could potentially serve this purpose.

We can execute the logic on the main side:

  1. Get the LSN before the backup:

    pgmoneta_ext_get_lsn();
  2. Perform our backup process.

  3. Get the LSN after the backup:

    pgmoneta_ext_get_lsn();

    Is this logic and my understanding correct?

Jubilee101 commented 2 weeks ago

Since the school year has started, I might not always be able to respond promptly. However, I'll do my best to stay on top of the project, and I'll do my best to complete it.

That is quite ok, just keep us posted every week and let us know if you get distracted by other business.

I'm a little confused right now because I don't have a rough outline of the entire process. Could you clarify where we should define the data structures and in which file?

I'm thinking in main it should be an option in backup, something like pgmoneta-cli backup primary [full | incremental], where it defaults to full if the option is not offered or there are no backups in the archive dir. And we need another set of workflow nodes for the incremental, which means changing wf_backup(), we may need to create some slightly different post backup workflow nodes for the incremental backup as well.

As for the major backup process, it should still happen in wf_backup.c, maybe a new set of incrementalbackup_setup/execute/teardown.

You don't have to worry about them now. Just have a rough picture in mind and focus on getting the APIs we are going to need in. We'll get to the assemble in the end.

I did some research on retrieving the start/end LSN and found an internal PostgreSQL function, GetXLogWriteRecPtr(), which retrieves the current write pointer of the WAL (Write-Ahead Log).

Great! Have you checked internally how it works? According to https://www.interdb.jp/pg/pgsql10/01.html#1011-pg_backup_start-ver14-or-earlier-pg_start_backup, it needs to do a checkpoint first. That's what we worry about, since doing a checkpoint requires a pretty hight level of privilege. I guess that's why pgbackrest by defaults waits until a regular checkpoint to happen naturally. But they do seem to have some ways to start immediately.

GuChad369 commented 1 week ago

Thank you for your explanation; it's much clearer now, and I have a rough outline.

If there are any specific APIs you need, just create an issue, write down the requirements, and I'll do my best to implement them.

As for GetXLogWriteRecPtr(), I believe it doesn't require a checkpoint first. Based on the source code, the GetXLogWriteRecPtr function does not trigger a checkpoint; it simply reads and returns the latest WAL write position.

I believe the structure in PostgreSQL, XLogCtlData, is always maintained in shared memory, which tracks various aspects of WAL processing.

If we define our own function in the extension and use it, without restricting the privilege, the repl role could also use it to get the LSN.

GuChad369 commented 1 week ago

Should we create the function pgmoneta_ext_get_lsn() in our extension using GetXLogWriteRecPtr() and test it to see if it works? Do you think it's worth trying?

jesperpedersen commented 1 week ago

Yes, we can try and see if it works. We can always remove as part of a cleanup before the release