Recursive symlinks cause timeouts

katmoody commented 7 years ago

Description of issue

On one site that was experiencing the spinning wheel of doom when configuring site size, we were able to gain additional troubleshooting help from the tech support for her host, SiteGround.

This led to the discovery that symlinks were causing a loop that maxed the timeout limits on the server.

From Ivan, at Siteground:

The backup functionality is breaking even on a default, empty installation if you have a symlink pointing to the current folder e.g:

lrwxrwxrwx 1 user user 1 Dec 15 09:14 symlink -> ./

This leads to a loop, that can be seen by stracing the admin-ajax.php process once you start the backup and eventually the process gets killed because a Timeout limitation, max_execution_time, etc. The backup process is completing for less than a minute on the same default WP installation when the symlinks are gone.

Additional Useful Information

They have solid server support with max_execution_limit at 120 (tried pushing it to 300 with no effect) and memory_limit at 768M.

Basically all backup types, even with files excluded, attempt to run forever when clicked to run manually. (and never complete on a schedule).

This was in the support forums: https://wordpress.org/support/topic/backup-not-completing-tried-3-6-3-beta/#post-8568494

katmoody commented 7 years ago

Discussed in Slack with @willmot and @shadyvb regarding the symlinks effect so tagging them both here for further feedback ;-) And tagging @sambulance so she can weigh in if she has a few minutes!

katmoody commented 7 years ago

The only real reference I'm seeing to specific symlinks are within the backupwordpress/admin/schedule-form-excludes.php here: https://github.com/humanmade/backupwordpress/blob/cf5579a0a3b2dcec6828ed5af497db5125ca2e5b/admin/schedule-form-excludes.php

at line 207:

          <?php if ( is_link( Path::get_root() ) ) :
                    esc_html_e( 'Symlink', 'backupwordpress' );

and at line 352:

          <?php if ( $file->isLink() ) : ?>
                    <span title="<?php echo esc_attr( wp_normalize_path( $file->getRealPath() ) ); ?>">
                              <?php esc_html_e( 'Symlink', 'backupwordpress' ); ?>
            </span>

willmot commented 7 years ago

This is a relatively straight-forward edge-case, we should add unit tests for it and fix. Everywhere we're dealing with symlinks we need to make sure we're handling symlinks which point back to either the current folder or a parent folder and thus will cause recursion. I assume things like zip already handle this.

katmoody commented 7 years ago

@willmot - I searched but couldn't find an example similar to how we're using this above - what does esc_html_e( 'Symlink', 'backupwordpress' ); translate to? All examples I saw of Symlinks in php used something more like symlink( name, URL ) ... Just wanting to understand how it works if possible.

willmot commented 7 years ago

what does esc_html_e( 'Symlink', 'backupwordpress' );

That code just outputs whether the file is a symlink or a file or a folder as displayed in the excludes file listings:

All examples I saw of Symlinks in PHP used something more like

symlink() creates a symlink, all those uses are in the unit tests where we're creating symlinks and then testing whether they're handled correctly. We don't currently have a test for a recursive symlink, which is the issue here.

Both recursiveDirectoryIterator (which loops the filesystem and adds files using ZipArchive and the unix zip binary natively support following symlinks so there's no actual "symlink" code, we just need to figure out whether which of those is affected by this issue and then figure out a way to work around it.

willmot commented 7 years ago

I did a bit more testing and this is a pretty gnarly problem.

We could likely work around it in ZipArchive without too much trouble as we should be able to test each symlink prior to adding to the archive.

The problem is zip which does follow recursive symlinks like any others. There's an OS limit on how far recursive symlinks will be followed (on Linux it defaults to 40) so if the backup process doesn't timeout before getting to that then the backup would complete (although it would contain 40 copies of the symlinked directory).

The issue with zip is that there's no way for us to test each directory before adding it, you simply pass a root path to zip rather than an array of files.

Solution

Our best bet is probably going to be to check for these recursive symlinks when scanning the sites filesize and if we find one, add it as a default exclude. The downside there is it relies on the site size scanner, which doesn't work for everyone.

libby-barker commented 7 years ago

@katmoody @shadyvb and @willmot going to add to sprint backlog so we can look at unit tests in the next sprint

katmoody commented 7 years ago

@willmot - I was doing some searching on how to include unit tests for recursive symlinks and stumbled across this older (2014) writeup ... is there a nugget there at all that you can use?

http://www.lonhosford.com/lonblog/2014/01/26/how-to-traverse-file-directories-recursively-with-php-recursivedirectoryiterator-class/

willmot commented 7 years ago

is there a nugget there at all that you can use?

There isn't, but thanks for trying to research 👍. We're using RecursiveDirectoryIterator already as part of the ZipArchive method and can likely work around this problem there fairly easily. The issue is zip which works differently.

katmoody commented 7 years ago

Hey @willmot - curious if something like this, which will list the recursive link but not follow it to that information, could be used on this?

Found at this site as I was reading up on using Unix Zip command - http://www.computerhope.com/unix/zip.htm

-y, --symlinks
For UNIX and VMS (V8.3 and later), store symbolic links as such in the zip archive, instead of compressing and storing the file referred to by the link. This can avoid multiple copies of files being included in the archive as zip recurses the directory trees and accesses files directly and by links.

willmot commented 7 years ago

In addition to the solutions proposed in https://github.com/humanmade/backupwordpress/issues/1150#issuecomment-268849475 we should also provide a way (via setting a constant HMBKP_FOLLOW_SYMLINKS to false) to just disable following symlinks all together, if that Constant was set we'd then:

pass the-yflag tozip` as @katmoody recommended in https://github.com/humanmade/backupwordpress/issues/1150#issuecomment-274545320
stop passing the FOLLOW_SYMLINKS flag to DirectoryIterator.
ensure symlinks show as excluded in the excludes file browser.

katmoody commented 7 years ago

This looks doable ... and I have a few folks who'd be willing to test it as soon as we have it ready in beta. ;-)

shadyvb commented 7 years ago

So to summarise the tasks here:

[ ] Write unit tests that create recursive symlinks
[ ] Introduce a new config option HMBKP_FOLLOW_SYMLINKS to disable symlink traversing altogether, which then:
[ ] - Pass -y flag to zip calls
[ ] - Stop passing FOLLOW_SYMLINKS to DirectoryIterator
[ ] - Shows symlinks as excluded in the excludes file browser
[ ] Figure out if we can detect recursive symlinks, so we don't need to disable backing them up altogether. Probably by following each, and then make that the default behaviour ( which can then be overridden ).

xibodevelopment / backupwordpress

Recursive symlinks cause timeouts #1150

Description of issue

Additional Useful Information