pgmoneta / pgmoneta

Backup / restore solution for PostgreSQL
https://pgmoneta.github.io
BSD 3-Clause "New" or "Revised" License
121 stars 38 forks source link

Direct IO #269

Open jesperpedersen opened 1 month ago

jesperpedersen commented 1 month ago

Investigate the impact of using O_DIRECT for disk I/O - especially in restore scenarios

palak-chaturvedi commented 1 month ago

Can I work on this? Any suggestion on how to start?

jesperpedersen commented 1 month ago

We need a disk benchmark to test the current setup vs using O_DIRECT. Latter should be faster, but we need to know that the extra complexity is worth it

jesperpedersen commented 1 month ago

This task is also about having a uniform disk access method - so maybe an io.h that operates on FILE*

jesperpedersen commented 1 month ago

https://commitfest.postgresql.org/46/4532/ could be a good read as well...

palak-chaturvedi commented 1 month ago

Hey, @jesperpedersen I did some research on DIRECT_IO.

  1. We replace the fopen tag with open() - that gets attribute O_DIRECT and create a file descriptor - fd (int). int open(const char *pathname, int flags, mode_t mode);
  2. Create a fileFlag variable and fileMode variable for permissions.
  3. The O_DIRECT flag can be added to fileFlag according to needs.
  4. Open the fd to return a pointer using FILE* fp = fdopen(fd, "w");
  5. Change file open method in all places.
  6. And also add a PG_MONETA_DIRECT_IO flag in configuration if that flag is true we will call the O_DIRECT method otherwise we won't. https://www.codequoi.com/en/handling-a-file-by-its-descriptor-in-c/

And do we also need to check that the shared memory is aligned to the disk in pgmoneta also? This is based on the work that is done in postgres for direct io.

palak-chaturvedi commented 1 month ago

This is the memory usage that I have found normally

start

Every 2.0s: free -m    fedora: Wed Apr 24 11:47:02 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         773        4241         1           985        4996
Swap:           5768           0        5768

after init db

Every 2.0s: free -m    fedora: Wed Apr 24 11:48:45 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         918        3528          40        1593        4850
Swap:           5768           0        5768

Create User command

Every 2.0s: free -m    fedora: Wed Apr 24 11:54:04 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         955        3483          68        1628        4813
Swap:           5768           0        5768

Pgmoneta start

Every 2.0s: free -m    fedora: Wed Apr 24 11:56:16 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         852        3583          69        1634        4916
Swap:           5768           0        5768

pgmoneta cli

Every 2.0s: free -m    fedora: Wed Apr 24 11:56:49 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         925        3481          91        1684        4843
Swap:           5768           0        5768

pgmoneta stopped

Every 2.0s: free -m    fedora: Wed Apr 24 11:57:58 2024

               total        used        free      shared  buff/cache   available
Mem:            5769         931        3465         106        1710        4837
Swap:           5768           0        5768
jesperpedersen commented 1 month ago

Looks like you are heading in the right direction !

If we unify the API in io.h and io.c using O_DIRECT then I don't think we need a configuration option for it.

Once you have created a work branch feel free to share it

jesperpedersen commented 3 weeks ago

@palak-chaturvedi How is this going ?

shikharish commented 1 week ago

What is the status of this issue? I will take it up if no one is working on it.

jesperpedersen commented 1 week ago

@shikharish Please, take over

shikharish commented 1 day ago

@jesperpedersen We are using fwrite() for writing files. In some cases the like here, size=1 and n=number of bytes while here, size=number of bytes and n=1. What is the reason for this?

jesperpedersen commented 1 day ago

If we know there is an entire block it can be written at once

jesperpedersen commented 1 day ago

The important part is that we can look at optimizing later - right now it is about creating a unified interface using O_DIRECT