Open devnullrandomzero opened 2 years ago
Possible culprit: A lot of files sharing the same hash.
# ls -l
[...]
-rw-------. 1 apache apache 173 Sep 27 2021 sess_fb9077edd349cdb59a1f5668dea7273d
-rw-------. 1 apache apache 173 Sep 30 2021 sess_fb91e677139a9b5ece4ce5592de1201a
-rw-------. 1 apache apache 173 Sep 28 2021 sess_fb91fe53aa48c884ba12b48718590d52
[...]
# md5sum ...
[...]
4e5edb5c7885314764af8aaef18732c0 sess_fb9077edd349cdb59a1f5668dea7273d
4e5edb5c7885314764af8aaef18732c0 sess_fb91e677139a9b5ece4ce5592de1201a
4e5edb5c7885314764af8aaef18732c0 sess_fb91fe53aa48c884ba12b48718590d52
[...]
Sorry, that's not enough details.
commit
?)fsvs urls dump
)-d
); perhaps you should send me that as an email, thoughApart from these details -- is it important for your usecase to archive volatile data? These are client sessions, right?
I typically remove /var/cache/
and similar stuff - that just bogs down the repository, IMO.
For the debug log you'll need to stop the process - Ctrl-C
after a few ten or hundred files should be enough.
Perhaps you could use valgrind or something similar to see where the allocations happen?
I just looked; on commit each file has its own subpool (https://github.com/phmarek/fsvs/blob/master/src/commit.c#L679), so there isn't much I can do. I know that the various repository access methods have different memory usage patterns; you could try to (temporarily) use a different method, or commit only parts:
for a in 0 1 2 3 4 5 6 7 8 9 a b c d e f ; do
fsvs commit .../sess_$a* # if your shell (and RAM) allow 20k files at once
done
# or in 256 pieces via a loop
for a in $(seq 0 255) ; do fsvs commit .../sess_$(printf %x $a)* ; done
Does that happen on commit
at all?
Sorry, that's not enough details.
You are right, I was in a hurry... ;-)
Which command? (I guess a
commit
?)
commit.
Version of apr and subversion libraries, please
apr-devel-1.4.8-7.el7.x86_64 apr-util-devel-1.5.2-6.el7.x86_64 subversion-devel-1.7.14-16.el7.x86_64
If it's something to the repository, please show the connection definition (
fsvs urls dump
)
fsvs urls dump
name:,prio:0,target:HEAD,ro:0,file:///<snip>
It might also be helpful to see a debug log (
-d
); perhaps you should send me that as an email, though
I will have to restore this exact snapshot... I will come bask to you.
Apart from these details -- is it important for your usecase to archive volatile data? These are client sessions, right? I typically remove
/var/cache/
and similar stuff - that just bogs down the repository, IMO.
Yeah... I know...
I know that the various repository access methods have different memory usage patterns; you could try to (temporarily) use a different method, or commit only parts: [...]
That was tested to understand the culprit, including ignoring the directory / files. Only, we want to keep all snapshots in sync with commits.
Using apr 1.7 and subversion 1.14 I get ~50MB memory use for 10k files with a file://
URL... you've been at 9GB.
The subversion changelog shows many fixes regarding memory usage since 1.7.
For a nice logical commit history, you might be able to copy your svn trunk
to a temporary branch, commit on that one in steps, and then merge back; of course, that'll still leave you with non-incremental commit IDs on your trunk
.
For an unbroken commit ID range, you could try
debootstrap
-- see https://doc.fsvs-software.org/doxygen-gif/group__howto__chroot.html)svn+ssh://localhost/...
URL - this might behave (much) better, as quite some processing happens in another processSorry, I don't think I can help on this end.
Reproducible with FSVS 1.2.9 and current revision (1df0f37). CentOS 7.
FSVS memory usage:
dmesg:
~500k files.