Open katmoody opened 7 years ago
Discussed in Slack with @willmot and @shadyvb regarding the symlinks effect so tagging them both here for further feedback ;-) And tagging @sambulance so she can weigh in if she has a few minutes!
The only real reference I'm seeing to specific symlinks are within the backupwordpress/admin/schedule-form-excludes.php here: https://github.com/humanmade/backupwordpress/blob/cf5579a0a3b2dcec6828ed5af497db5125ca2e5b/admin/schedule-form-excludes.php
at line 207:
<?php if ( is_link( Path::get_root() ) ) :
esc_html_e( 'Symlink', 'backupwordpress' );
and at line 352:
<?php if ( $file->isLink() ) : ?>
<span title="<?php echo esc_attr( wp_normalize_path( $file->getRealPath() ) ); ?>">
<?php esc_html_e( 'Symlink', 'backupwordpress' ); ?>
</span>
This is a relatively straight-forward edge-case, we should add unit tests for it and fix. Everywhere we're dealing with symlinks we need to make sure we're handling symlinks which point back to either the current folder or a parent folder and thus will cause recursion. I assume things like zip
already handle this.
@willmot - I searched but couldn't find an example similar to how we're using this above - what does esc_html_e( 'Symlink', 'backupwordpress' );
translate to? All examples I saw of Symlinks in php used something more like symlink( name, URL )
... Just wanting to understand how it works if possible.
what does esc_html_e( 'Symlink', 'backupwordpress' );
That code just outputs whether the file is a symlink or a file or a folder as displayed in the excludes file listings:
All examples I saw of Symlinks in PHP used something more like
symlink()
creates a symlink, all those uses are in the unit tests where we're creating symlinks and then testing whether they're handled correctly. We don't currently have a test for a recursive symlink, which is the issue here.
Both recursiveDirectoryIterator
(which loops the filesystem and adds files using ZipArchive
and the unix zip
binary natively support following symlinks so there's no actual "symlink" code, we just need to figure out whether which of those is affected by this issue and then figure out a way to work around it.
I did a bit more testing and this is a pretty gnarly problem.
We could likely work around it in ZipArchive
without too much trouble as we should be able to test each symlink prior to adding to the archive.
The problem is zip
which does follow recursive symlinks like any others. There's an OS limit on how far recursive symlinks will be followed (on Linux it defaults to 40) so if the backup process doesn't timeout before getting to that then the backup would complete (although it would contain 40 copies of the symlinked directory).
The issue with zip
is that there's no way for us to test each directory before adding it, you simply pass a root path to zip
rather than an array of files.
Solution
Our best bet is probably going to be to check for these recursive symlinks when scanning the sites filesize and if we find one, add it as a default exclude. The downside there is it relies on the site size scanner, which doesn't work for everyone.
@katmoody @shadyvb and @willmot going to add to sprint backlog so we can look at unit tests in the next sprint
@willmot - I was doing some searching on how to include unit tests for recursive symlinks and stumbled across this older (2014) writeup ... is there a nugget there at all that you can use?
is there a nugget there at all that you can use?
There isn't, but thanks for trying to research 👍. We're using RecursiveDirectoryIterator already as part of the ZipArchive method and can likely work around this problem there fairly easily. The issue is zip
which works differently.
Hey @willmot - curious if something like this, which will list the recursive link but not follow it to that information, could be used on this?
Found at this site as I was reading up on using Unix Zip command - http://www.computerhope.com/unix/zip.htm
-y, --symlinks
For UNIX and VMS (V8.3 and later), store symbolic links as such in the zip archive, instead of compressing and storing the file referred to by the link. This can avoid multiple copies of files being included in the archive as zip recurses the directory trees and accesses files directly and by links.
In addition to the solutions proposed in https://github.com/humanmade/backupwordpress/issues/1150#issuecomment-268849475 we should also provide a way (via setting a constant HMBKP_FOLLOW_SYMLINKS
to false
) to just disable following symlinks all together, if that Constant was set we'd then:
pass the
-yflag to
zip` as @katmoody recommended in https://github.com/humanmade/backupwordpress/issues/1150#issuecomment-274545320FOLLOW_SYMLINKS
flag to DirectoryIterator
.This looks doable ... and I have a few folks who'd be willing to test it as soon as we have it ready in beta. ;-)
So to summarise the tasks here:
HMBKP_FOLLOW_SYMLINKS
to disable symlink traversing altogether, which then:-y
flag to zip
callsFOLLOW_SYMLINKS
to DirectoryIterator
Description of issue
On one site that was experiencing the spinning wheel of doom when configuring site size, we were able to gain additional troubleshooting help from the tech support for her host, SiteGround.
This led to the discovery that symlinks were causing a loop that maxed the timeout limits on the server.
From Ivan, at Siteground:
Additional Useful Information
They have solid server support with max_execution_limit at 120 (tried pushing it to 300 with no effect) and memory_limit at 768M.
Basically all backup types, even with files excluded, attempt to run forever when clicked to run manually. (and never complete on a schedule).
This was in the support forums: https://wordpress.org/support/topic/backup-not-completing-tried-3-6-3-beta/#post-8568494