FSVS killed by kernel oom

phmarek / fsvs

Full System Versioning System

GNU General Public License v3.0

6 stars 3 forks source link

FSVS killed by kernel oom #3

Open devnullrandomzero opened 2 years ago

devnullrandomzero commented 2 years ago

Reproducible with FSVS 1.2.9 and current revision (1df0f37). CentOS 7.

FSVS memory usage:

# while true; do echo $(date -u) $(pmap 5999 | tail -n 1) ; sleep 1 ;  done

Sat Aug 27 00:31:16 UTC 2022 total 3187340K
Sat Aug 27 00:31:17 UTC 2022 total 3205780K
Sat Aug 27 00:31:18 UTC 2022 total 3224072K
Sat Aug 27 00:31:19 UTC 2022 total 3246788K
Sat Aug 27 00:31:20 UTC 2022 total 3255944K
Sat Aug 27 00:31:21 UTC 2022 total 3278844K
Sat Aug 27 00:31:22 UTC 2022 total 3292760K
Sat Aug 27 00:31:23 UTC 2022 total 3306512K
Sat Aug 27 00:31:24 UTC 2022 total 3329288K
[...]
Sat Aug 27 00:34:34 UTC 2022 total 6604204K
Sat Aug 27 00:34:35 UTC 2022 total 6629424K
Sat Aug 27 00:34:36 UTC 2022 total 6644568K
Sat Aug 27 00:34:37 UTC 2022 total 6659712K
Sat Aug 27 00:34:38 UTC 2022 total 6680100K
Sat Aug 27 00:34:39 UTC 2022 total 6690024K
Sat Aug 27 00:34:40 UTC 2022 total 6705192K
Sat Aug 27 00:34:41 UTC 2022 total 6720360K
Sat Aug 27 00:34:42 UTC 2022 total 6740600K
Sat Aug 27 00:34:43 UTC 2022 total 6750720K
[...]
Sat Aug 27 00:37:02 UTC 2022 total 8935588K
Sat Aug 27 00:37:03 UTC 2022 total 8956964K
Sat Aug 27 00:37:04 UTC 2022 total 8967652K
Sat Aug 27 00:37:06 UTC 2022 total 8994392K
Sat Aug 27 00:37:07 UTC 2022 total 9010444K
Sat Aug 27 00:37:08 UTC 2022 total 9031852K
Sat Aug 27 00:37:09 UTC 2022 total 9042564K
Sat Aug 27 00:37:10 UTC 2022 total 9063988K
Sat Aug 27 00:37:11 UTC 2022 total 9096336K
Sat Aug 27 00:37:12 UTC 2022 total 9112240K
Sat Aug 27 00:37:16 UTC 2022 total 0K

dmesg:

[5904059.722441] Out of memory: Kill process 5999 (fsvs) score 846 or sacrifice child
[5904059.722508] Killed process 5999 (fsvs), UID 0, total-vm:9144620kB, anon-rss:7347752kB, file-rss:0kB, shmem-rss:0kB

~500k files.

devnullrandomzero commented 2 years ago

Possible culprit: A lot of files sharing the same hash.

# ls -l 
[...]
-rw-------. 1 apache apache 173 Sep 27  2021 sess_fb9077edd349cdb59a1f5668dea7273d
-rw-------. 1 apache apache 173 Sep 30  2021 sess_fb91e677139a9b5ece4ce5592de1201a
-rw-------. 1 apache apache 173 Sep 28  2021 sess_fb91fe53aa48c884ba12b48718590d52
[...]

# md5sum ...
[...]
4e5edb5c7885314764af8aaef18732c0  sess_fb9077edd349cdb59a1f5668dea7273d
4e5edb5c7885314764af8aaef18732c0  sess_fb91e677139a9b5ece4ce5592de1201a
4e5edb5c7885314764af8aaef18732c0  sess_fb91fe53aa48c884ba12b48718590d52
[...]

phmarek commented 2 years ago

Sorry, that's not enough details.

Which command? (I guess a commit?)
Version of apr and subversion libraries, please
If it's something to the repository, please show the connection definition (fsvs urls dump)
It might also be helpful to see a debug log (-d); perhaps you should send me that as an email, though

Apart from these details -- is it important for your usecase to archive volatile data? These are client sessions, right? I typically remove /var/cache/ and similar stuff - that just bogs down the repository, IMO.

phmarek commented 2 years ago

For the debug log you'll need to stop the process - Ctrl-C after a few ten or hundred files should be enough. Perhaps you could use valgrind or something similar to see where the allocations happen?

I just looked; on commit each file has its own subpool (https://github.com/phmarek/fsvs/blob/master/src/commit.c#L679), so there isn't much I can do. I know that the various repository access methods have different memory usage patterns; you could try to (temporarily) use a different method, or commit only parts:

for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do
    fsvs commit .../sess_$a*  # if your shell (and RAM) allow 20k files at once
done

# or in 256 pieces via a loop
for a in $(seq 0 255) ; do fsvs commit .../sess_$(printf %x $a)* ; done

Does that happen on commit at all?

devnullrandomzero commented 2 years ago

Sorry, that's not enough details.

You are right, I was in a hurry... ;-)

Which command? (I guess a commit?)

commit.

Version of apr and subversion libraries, please

apr-devel-1.4.8-7.el7.x86_64 apr-util-devel-1.5.2-6.el7.x86_64 subversion-devel-1.7.14-16.el7.x86_64

If it's something to the repository, please show the connection definition (fsvs urls dump)

fsvs urls dump
name:,prio:0,target:HEAD,ro:0,file:///<snip>

It might also be helpful to see a debug log (-d); perhaps you should send me that as an email, though

I will have to restore this exact snapshot... I will come bask to you.

Apart from these details -- is it important for your usecase to archive volatile data? These are client sessions, right? I typically remove /var/cache/ and similar stuff - that just bogs down the repository, IMO.

Yeah... I know...

I know that the various repository access methods have different memory usage patterns; you could try to (temporarily) use a different method, or commit only parts: [...]

That was tested to understand the culprit, including ignoring the directory / files. Only, we want to keep all snapshots in sync with commits.

phmarek commented 2 years ago

Using apr 1.7 and subversion 1.14 I get ~50MB memory use for 10k files with a file:// URL... you've been at 9GB.

The subversion changelog shows many fixes regarding memory usage since 1.7.

For a nice logical commit history, you might be able to copy your svn trunk to a temporary branch, commit on that one in steps, and then merge back; of course, that'll still leave you with non-incremental commit IDs on your trunk.

For an unbroken commit ID range, you could try

newer subversion libraries (run FSVS from a chroot, eg. from debootstrap -- see https://doc.fsvs-software.org/doxygen-gif/group__howto__chroot.html)
switch to a svn+ssh://localhost/... URL - this might behave (much) better, as quite some processing happens in another process
ignore these files... (that's what you're doing already, right)?

Sorry, I don't think I can help on this end.