silverstripe / sspak

Tool for managing bundles of db/assets from Silverstripe environments
http://silverstripe.github.io/sspak/
BSD 3-Clause "New" or "Revised" License
47 stars 34 forks source link

Archive of large .sspak fails on asset extraction #29

Open derRobert opened 9 years ago

derRobert commented 9 years ago

Hi, i have a large website (asset folder is about 5GB). When i am creating a sspak file by

sspak save backup.sspak /path/to/website

the archive seems to be created

When i try to extract te same file afterwards:

sspak extract backup.sspak

The 2 containing archives are extracted: assets.tar.gz database.sql.gz

When i am trying to unzip the assets.tar.gz the following error occurs: gzip: stdin: unexpected end of file tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now

Any ideas ?

michalkleiner commented 8 years ago

What type of filesystem are you using? What is the size of the assets.tar.gz file after extraction from the sspak file?

dhensby commented 8 years ago

I think I've had this problem too - I might try to replicate

brettt89 commented 8 years ago

Have also run into this problem with 5+GB of assets.

Could see that it pulls in 5GB of assets into /tmp directory. But then has a problem putting them into assets.tar.gz . No errors are output during this process.

Debian 6.0.10 PHP 5.3.3-7+squeeze29

brettt89 commented 8 years ago

Have also confirmed this is an issue when using 'saveexisting'

robbieaverill commented 7 years ago

Related #52 and #53

NightJar commented 5 years ago

https://www.php.net/manual/en/phar.fileformat.tar.php

Whilst the ustar format may be more 'modern', it doesn't support adding files over 8GB in size to tar files.

So at time of writing, if you need to work with tar files that contain files over 8GB, you can't use PharData.

I'm sure the first time I read this it said 4GB. Although maybe I'm getting confused with FAT32.

michalkleiner commented 5 years ago

Still an issue regardless. Firesphere worked on a python clone of sspak, not sure where he got to.

axllent commented 4 years ago

Hey guys. We've hit this same issue and it caught us completely off guard. In one case the the assets folder was around 6.8GB, and the other close to 5GB (ext4 & btrfs filesystems). In both cases SSPak did not report any error on creation, but on extraction there was a the "unexpected EOF" (which wasn't initially noticed, nor the significantly smaller archive size). It was actually only on "missing assets" did the alarm bells ring. I was able to reproduce the issue on multiple systems, in multiple environments, and the total backup size is (supposed to be) nowhere near the 8GB limit reported on https://github.com/silverstripe/sspak/issues/53.

As we cannot rely on sspak when it comes to these large archives, nor wish to install python, I took it upon myself to write a "clone" in Go (SSBak). It is intended to work as a drop-in replacement for sspak, but it's worth noting that it doesn't currently have all the features of SSPak (ie: no ssh, git-remote / install and CSV tables) as we don't use/need any of those. It does however handle the large file sizes without an issue (in the native .sspak format). It's still "stable alpha", but if you're interested I would love some feedback / testing (issues etc should be on that repo's issues, not this one as I don't wish to hijack this thread).

tractorcow commented 4 years ago

I've had this same issue, but on an archive of only 1.5GB.

I've attempted to use ssbak as well, but I still get exactly the same issue .

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
axllent commented 4 years ago

@tractorcow I suspect the issue here is that the 1.5GB *.sspak archive itself is corrupt, so will always produce errors on extraction regardless what tool you use. Have you tried to create a new archive with ssbak vs: sspak?

tractorcow commented 4 years ago

Yes I have tried sspak, ssbak, and manually creating with tar, and then again with gnu-tar.

tractorcow commented 4 years ago

I'm now attempting to create the same sspak but on a linux VM. :)

axllent commented 4 years ago

I don't understand why there would be any gzip corruption (when created with ssbak), and if there was it should be reporting it immediately and returning an error (on creation). So if I understand correctly, regardless whether the backup is made with sspak or ssbak, neither return any error on creation, but both return an error on extraction? Is this on MacOS I assume, and how big is your assets directory in reality?

tractorcow commented 4 years ago

The assets were just under 2 GB, and compressed to about 1.5GB.

I'm about to test my new non-OSX created archive.

tractorcow commented 4 years ago

It turns out the issue was due to the device being out of disk space during extraction! Useful for anyone else who comes across this issue. :)

axllent commented 4 years ago

Thanks for the feedback @tractorcow - another automatic check I'll be building into ssbak as you aren't the first to be caught out by drive space issues ;-) I would have expected though some sort of error being returned by both tools when that happened - obviously not.