Closed alvinstarr closed 6 months ago
@alvinstarr The manifest is held in a temporary file located in the archive's metadata directory, for example "/var/lib/wyng/a_b0a0ccba36969efb25c7e08cf060115bc92c8365/Vol_48e331/S_20240212-123456-tmp/manifest.tmp". This should be on regular storage, not RAM (although its conceivable that /var/lib/ might be setup differently in your system).
The first thing I would suspect is Wyng's deduplicator. If you are enabling it, dedup will use a lot of RAM. The amount of RAM depends on the amount of data in the archive and in the volume being backed up. Disabling the deduplicator would be the first thing to try.
If deduplication is a hard requirement for your larger volumes, you could try creating a separate archive for those volumes using a larger chunk size setting. The default is 128kB and setting it to 1MB would reduce the metadata size (and dedup RAM use) by about 80%.
(A possible option for enhancement would be to automatically detect such large volumes and configure the dedup index to reside entirely on disk. However, there would be a quite noticeable performance penalty.)
There is also the possibility that a Python garbage collection issue not related to dedup exists. That could be more difficult to diagnose and address.
I should have mentioned that, in addition to the regular manifest.tmp in /var/lib, Wyng's dedup mode will also create a "hashindex.dat" under /tmp (which is usually in RAM). It is an extension of the dedup index located in RAM (Python heap mem).
We have not turned on dedup. My fear was that it would use up too much memory. There are 2 python processes running. the main command using 65G and a dest_helper.py using 90G and growing.
I know just enough python to get myself in trouble but I would be willing to help out find what this problem is any way I can.
@alvinstarr OK thanks, as I don't have anything nearly that large your feedback would be important.
I can try monitoring what a few TB of backups does on my end; hopefully I will get some clues that way. It sounds like a basic garbage collection issue, something that is suppressing gc.
First I'll try adding a debug mode that reports resource use every 100MB or so, and we can go from there.
On your end, I need to know the OS distro, Python version and any Python-specific env settings, as well as the command line used to invoke Wyng (obfuscating vol names etc is fine).
@alvinstarr There is a potential fix now in 08wip branch. Test by doing 'wyng send' on large volumes. (This does not yet have a debug mode that will show periodic resource status, but the existing --debug will show resources at the end of each volume send.)
It looks like it may have finished but then puked.
scl enable rh-python38 'wyng --dest=file:/edocs_snapshots/prod/wyng.backup --local=EDOCS_VOLUMEGROUP1/edocs_SNAP_thinpool send PROD'
Wyng 0.8beta release 20231002
Un-encrypted archive 'file:/edocs_snapshots/prod/wyng.backup'
Last updated 2024-02-11 23:01:42.216172 (-05:00)
Preparing snapshots in '/dev/EDOCS_VOLUMEGROUP1/'...
Preparing full scan of 'PROD'
Sending backup session 20240212-070521:
——————————————————————————————————————————
17109209 | 70% | PROD
20607627 | 99%Warning: tar process timeout.
Traceback (most recent call last):
File "/usr/local/bin/wyng", line 4667, in <module>
monitor_send(storage, aset, selected_vols, monitor_only=False)
File "/usr/local/bin/wyng", line 3467, in monitor_send
send_volume(storage, vol, curtime, ses_tags, send_all=datavol in send_alls)
File "/usr/local/bin/wyng", line 3250, in send_volume
raise RuntimeError("tar transport failure code %d" % retcode)
RuntimeError: tar transport failure code 99
You have new mail in /var/spool/mail/root
[root@edocs4 edocs_snapshots]#
The OS is CentOS Linux release 7.5.1804 (Core) The python release is rh-python38.x86_64 from scl Python is run using scl enable python38 then the command. This is being done as root.
Also these were thick volumes that were converted to thin volumes. When we created the thin pool we set a size of 128K. I am not sure if we can undo that process and create a new thinpool with bigger chunks like 1MB.
@alvinstarr Go ahead and take the latest update from 08wip branch and do a send with that. It should behave a lot better.
When we created the thin pool we set a size of 128K.
This has no bearing on the archive chunk size, which is independent of the LVM chunk size. The archive chunk size can only be set by 'wyng init' when creating the archive. If the tarfile issue is really the culprit, I don't think the chunksize will have to change in order for it to work (although I'd still recommend eventually moving to a new archive that is set to a larger chunk size).
Also BTW (unrelated): You may want to check out the --import-other-from
option if you are creating thin lvs only for the sake of backing up with wyng and every send is doing a "full scan". This option can backup the thick lv directly and save you the step of creating a thin lv.
We would hope to do a full copy and then incremental copies based on the tick/tock snapshots. It took almost 7 days to backup the full volume. looking at the last run it seems to have failed as it was wrapping up and copying the .tmp files the tar timed out. I have a feeling that the timeouts may be too short because the compressed manifest file is 6.5G
Given how long it took to run the backup I am of half a mind to try and manually complete the backup and see if I can then do an incremental from there.
Our data set is largely compressed image files to there is not a whole lot of room for dedup so I am inclined to try bumping up the chunk-factor. Currently there are something like 224,510,032 chunks and that many file create operations starts to take some time. A chunk factor of 6 would be 2MiB right? Any thought of extending the chunk factor beyond 6? The current backup directory has 937526 sub-directories in it so that if you do an "ls" you should go for a very long coffee break if you want to see the result. It may be worth thinking of 3 levels of directories for the backups if the volume is really big.
I am really impressed with the work you have done. So none of the above is intended as criticism but more as observations.
I will start a new backup with your recent changes to the 08wip and also bump up the chunk size to try to get better performance. An interesting addition would be something like a fuse file-system that could make a backup mountable and inspect-able.
@alvinstarr I note the last output log you posted had Wyng release 20231002, not the one with the tarfile fix I posted yesterday. That would mean both the main and helper processes were struggling with swapped-out memory lists from the tarfile module when Wyng was closing the tar stream. This no doubt contributed toward the timeout. The fix would alleviate that, but you could also rig the timeout so it waits longer by changing the seconds value in untar.wait(timeout=60)
.
Our data set is largely compressed image files to there is not a whole lot of room for dedup so I am inclined to try bumping up the chunk-factor.
Yes, chunk factor 6 = 2MB. That would reduce a 6.5GB manifest to roughly 400MB. Sounds like maxing that out would be optimal in your use case. Wyng doesn't allow for chunk sizes larger than 2MB, and I'm not sure extending that would noticeably help in your case: Going from 1/16 the original metadata & dir size to 1/32 for example, and I'm not sure what you would consider optimal. Obviously, Wyng makes the back-end filesystem do a lot of the work and there needs to be a balance, which is why I allowed for a chunk size factor.
It may be worth thinking of 3 levels of directories for the backups if the volume is really big.
I'm considering it for a future version of the Wyng format.
Manually completing the send
would be a bit risky and requires having the following files intact, nested in the correct subdirs:
They would all have to be present in the archive and renamed without the '.tmp'. Then you should do a verify
on the volume to make sure the archive structure is intact.
Coping strategy: Instead of trying to make the vol fit in the existing archive, you could create a new archive with larger chunk factor and back up just that one volume to it (for now). And then schedule a switch-over date for when you would move the rest of your volumes over to the new archive. (Just a suggestion.)
An interesting addition would be something like a fuse file-system that could make a backup mountable and inspect-able.
Yes, this is issue 16,
Is there a way to disable compression? I have run a smaller volume with differing chunk sizes but the backup time remains relatively constant. The time increased a bit when I went to a larger chunk size and the read bandwidth seemed to drop a bit. I am assuming that is because of the compression step.
No, compression is a fixture in the send/receive processes.
You can get similar bandwidth to uncompressed by using 'zstd' compressor at a level less than 1.
Beyond that, there are optimization issues to improve send speed, such as #11. OTOH, Wyng is intended to sparse skip over much of the volume space most of the time, and that is currently where it is most optimized.
An interesting observation.
The first 3 blurps of traffic are: 1) with 128K blocks 2) with 2048K blocks 3) with 2048K blocks and compression set to zstd:1
Those tests were run a against a 500GB test volume. The 4th block is the 22TB volume I would expect the B/s to be in the same general ball-park but there is something going on. I will try to take a look at this when we get a complete backup finished in a few days.
I think what may be going on is the large difference in Cpython's garbage collection workload due to dynamic buffering having to juggle larger buffers. (Hmmm. Does a 1MB buffer behave much differently?)
Wyng does not yet use static buffering for transfer operations. And I always suspected that locally-based archives would someday throw performance issues that were masked by net access into high relief (as your benchmark just did).
It would also be interesting to see the difference, for instance, with the helper script removed from the local transfer loop. That in combination with using static buffers could make a big difference, IMO. However, the limitations of the zstandard lib I'm currently using precludes static buffering.
For now, I might want to move this discussion to issue #11 ...
This should be close-able now. Issue 11 is open to handle optimization ideas.
I am trying to backup a 26TB volume. About 1/2 way through the process the OOM killer kicked in and killed the backup. It looks like it consumed 64G on a system with 128G of ram and a 4G swap.
Is it possible that the manifest is held in memory until the backup is completed?