nextcloud / backup

Backup now. Restore later.
GNU Affero General Public License v3.0
244 stars 33 forks source link

Request for occ documentation and fix for OOM on upload to external storage #308

Open Dave-REBL opened 2 years ago

Dave-REBL commented 2 years ago

Nextcloud 23.0.4 running Backup 1.0.6, Ubuntu 16.04 server, PHP 7.3, MariaDB 10.2 Attempting to run Backup exclusively with the backup occ commands. sudo -u www-data php /var/www/nextcloud/occ backup:point:create sudo -u www-data php /var/www/nextcloud/occ backup:point:pack [point id] sudo -u www-data php /var/www/nextcloud/occ backup:point:upload

Partial success on backup of 190 GB (have yet to try restore) with local external storage configured.

This appears to be the same issue as reported in #154, #158, #205, #237, #244 , #268

I see some documentation on occ commands in README.md but there appears to be so much more:

image

Does anyone know if documentation on the full set of occ commands for Backup exists? If not, when will this documentation be available (similar to what's available in the Nextcloud Administration documentation for other occ commands)?

ddt3 commented 2 years ago

I second this request. Even if a command in described the user is left guessing what the parameters for that command actuall mean / how to use them.

nikhub commented 1 year ago

I can confirm the issues with OOM errors in the backup app that are mentioned in the tickets linked in the first post. The upload fails when performing the backup manually via command line and when performing the backup automatically via GUI settings.

Creating and packing backups works fine on both cases (automatically and manually), but when the backup is to be uploaded to the external storage (S3 server) the upload fails after some chunks (something bewteen 10 and 20 chunks). During manual backup the upload process is either terminated with a "Killed" message or the PHP out of memory expection behind the chunk information. Currently 2048MB are provided in the PHP .ini files, but the error also occured with 8GB available memory.

I am using Nextcloud 24.0.6 with backup app 1.1.3. running on Debian 11 server, PHP 8.1 and MariaDB 10.5.15.

fvillena commented 1 year ago

Same problem on Backup 1.1.3 when uploading to S3

imagen

danepowell commented 1 year ago

Looking through the codebase, I see numerous places where files are accessed directly instead of using streams, and file objects (which are probably > 2 GB in size!) are loaded directly into memory and never unset.

Simply unsetting some of these objects would at least prevent memory use from growing without bounds, keeping it under the chunk size, and using streams properly could reduce it to a more reasonable state (a few hundred MB).

kds69 commented 1 year ago

+1

kds69 commented 1 year ago

Maybe it can help someone = I could manage to make backup recovery work despite OOM issue, having taken below mitigations: -switch my 2GB RPi with a 4GB RPi -increase swap file to 2GB -clone OS from a 16Go SD card to a 32Go SD card (+extending partition w GPart on a VM Linux runing on my Windows 10...)

evought commented 1 year ago

I am getting the same issues with NextCloud 25.0.8, Backup 1.2.0, sftp external storage.

The automatic uploads fail, I force an upload of a full restore upload from the command-line:

php -d memory_limit=512M ./occ backup:point:upload

This produces output initially, gets to the actual upload, then abruptly exits, leaving an OOM in the log. 'du -sh' shows that the entire tree of the full restore point is 892 MB. Forcing an unlock with backup:point:unlock and trying again with a memory limit of 640 MB also fails. I cannot reasonably set the memory limit higher and I have no reason to believe it won't consume infinite memory as it does for others in this thread. Given that this is initial setup of the cloud before more users are brought on, backups will presumably get bigger in any case. There is no rational reason an upload of arbitrarily-large files ought not happen in constant memory-- that's the point of producing multiple data chunks in the first place, isn't it?

In several days of beating on this, I have not been able to successfully transfer a backup off of the server itself. This makes a 'backup' system essentially useless, and without a reasonable backup, it doesn't make sense to put effort into generating/storing data and then this (volunteer) organization cannot do what it needs to do. Is there any plan to find/fix this problem? Is there anything that can be usefully done to push that forward?

evought commented 1 year ago

Adding to my last post, I have continued experimenting with the smaller differential backups. I cannot successfully transfer a differential restore point that only takes up 201 MB on the local file system despite a 512M memory limit. So, this is a restore point which could fit entirely into RAM for the transfer and it still runs out of memory:

Allowed memory size of 536870912 bytes exhausted (tried to allocate 86333112 bytes) at /home/[redacted]/public_html/nextcloud/3rdparty/phpseclib/phpseclib/phpseclib/Net/SSH2.php#4444

It isn't just that the procedure is inefficient or that I don't have enough resources allocated, it just plain don't work ;-)

Another test fails in a different point:

Allowed memory size of 536870912 bytes exhausted (tried to allocate 104857752 bytes) at /home/[redacted]/public_html/nextcloud/3rdparty/phpseclib/phpseclib/phpseclib/Net/SFTP.php#3548

Both failures here are in phpseclib calls. This does not necessarily mean that phpseclib is leaking the memory, however, it could simply be that these lines are just coincidentally the first calls after something else has allocated everything available. And I have one further back where the actual failure occurs in PackService.php#606.

So, I come back to: Is there any plan to find/fix this problem? Is there anything that can be usefully done to push that forward?

I have on my task list setting up a NextCloud on a local VM (I was intending to use it to test restores and updates). I will have full control over the VM (as opposed to this hosting server we don't control), so, if I can reproduce this, I can get much more detailed debugging and tracing data. But, if nobody actually cares... ?

Real-Konai commented 1 year ago

Seems that it doesn't free the PHP-Memory after uploading a chunk somehow. I tried with php -d memory_limit=1024M ./occ backup:point:upload ... and it breaks after 4 chunks php -d memory_limit=4096M ./occ backup:point:upload ... and it breaks after 14 chunks php -d memory_limit=8192M ./occ backup:point:upload ... and it breaks after 30 chunks

Everytime it shows PHP Fatal error: Allowed memory size of ... bytes exhausted in .../custom_apps/backup/lib/Service/PackService.php on line 606.

Hope that helps

evought commented 1 year ago

Seems that it doesn't free the PHP-Memory after uploading a chunk somehow. ...

Yes, that fits with what I have seen as well, and the behavior had a familiar smell to it: https://stackoverflow.com/a/18092019

It tends to happen when you allocate gobs of memory in PHP inside a look using local variables. The local variables are not reclaimed until the function exits and the stack grows without bound. Refactoring the code -- as described in the above stackoverflow answer-- typically solves the problem. Compared to the significant memory allocation/deallocation, the cost of a function call in a loop is negligible (and loads better than halting with OOM!). I'd have to take apart the code to verify this, however.

In my own situation I have already uninstalled the Backup app and written a shell script to replace some of the functionality which then allowed me to upgrade past 25.x. I am not likely to go back unless Backup is updated to later NC releases (and becomes more stable). I may be able to set up a debug environment for 25.x in a VM, however, and poke at this IF I get a round-tuit.

Freundschaft commented 6 months ago

asking the other way round, did someone ever get a backup to work without running OOM?

evought commented 6 months ago

As an FYI, I now have an early in-house PERL tool to handle the backup of NextCloud external to the NC system. It interacts with OCC in order to extract needed parameters (and to back up the configuration itself) and to enter/exit maintenance mode at appropriate times. It creates full and incremental backups of the NextCloud directories and full backups of the database. It permits the addition of tags and metadata to the backups. It seems to work on both Alma and Ubuntu. By passing command-line parameters (a bit cumbersome right now), it works in both one-account hosted and multi-account virtual server environments, accounting for some differences in install trees.

Like the NextCloud module, my tool does a two-stage backup to minimize time in maintenance mode. First it creates the tar files and saves metadata in maintenance mode, then it goes back and does slower post-processing. On my test and production systems, this reduces time-in-maintenance mode to half or less of what it would be.

What it doesn't yet do is automated restore. The tool was designed to put everything in easy-to-manipulate, directly searchable tar files and JSON-based metadata, so, for the moment, I do restores manually with tar and mysql. This can be a bit involved, however, and trying to find/restore individual files in incremental backups is very tricky, particularly when one considers matching to database metadata. Using the system tar, however, is quite a bit faster and uses less RAM than the in-process Perl tar API, so I may leave use of the system tar for some cases as a long-term configuration option. The delay with automating restore is the need to use the tar API very carefully for extraction to avoid memory leaks similar to what the current NextCloud support encounters. I ended up putting that off for a little bit until life cooperates with my ability to focus on the problem and build some careful test cases.

I do have the beginnings of an API for extending the backup-related classes to add support for a metadata journal, potential remote storage/caching of metadata, syncing and expiring backups across storage devices, and the potential for GUI-browsing and search of metadata (BYOGUI). I had to allow extension of the backup-related classes for testing regardless (so that they could be mocked out to reduce testing dependence on NextCloud, database, file system, etc) and for security (metadata journal), so the potential extensibility was more or less free functionality. If I am going to allow stubbing out the backup storage device, I may as well make it extensible; if I am going to make a metadata journal for verifying test cases and for security, I may as well allow it to be replayed somewhere else. My intention is to distribute the tool under some appropriate terms and allow folks to use the API to integrate it into their own larger process.

The tool has perldoc, GNU-compliant command-line switches, command-line help, and a growing set of test-cases, but I think it needs a bit more cleanup before it can be readily used by others, particularly trimming out some pieces that really didn't work the way I wanted them to. Right now, I am effectively the only one using it and it is built around my needs. I wrote in PERL because PHP isn't really an appropriate system scripting language (whereas Perl and others are designed for OS interaction). I also explicitly wanted a backup solution free from the the NextCloud codebase, running outside the server process, protected from NextCloud upgrades changing the backup/restore code, and from NextCloud backup's current serious bugs. I use tar and JSON so that a future-- clean and low-overhead-- NextCloud module could interact with the result without serious trouble (say, providing a GUI for metadata). But first and primarily, I created this tool to keep me from going insane with the current crippling bugs in the backup system.

Anyway, even if I am not yet ready to share this widely, I thought describing what I am doing might give people their own ideas as to what is possible/easy and the bits that are quite tough. I can also share bits of what I am doing to further that discussion.

Sent with Proton Mail secure email.

On Tuesday, March 19th, 2024 at 5:55 AM, Qiong Wu @.***> wrote:

asking the other way round, did someone ever get a backup to work without running OOM?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.