mrAceT / nextcloud-S3-local-S3-migration

Script for migrating Nextcloud primary storage from S3 to local to S3 storage
GNU General Public License v3.0
67 stars 11 forks source link

postgres support, dry-run #1

Open krakazyabra opened 1 year ago

krakazyabra commented 1 year ago

Hi! Thanks for great job! Previously I had issue in original lukasmu/nextcloud-s3-to-disk-migration repo about postgres support. I'm not programmer, so I cannot implement that. But lot of users use postgres with NC. Will be awesome if you add psql support

Second question is about dry-run. Is it possible to run the script but without real copy of data, f.e. to test if everything is ok.

And third question: is it possible to copy from S3 to local, but keep objects in S3? in case if something will go wrong there should be possibility to switch back to S3. of course, DB backup is mandatory before starting.

mrAceT commented 1 year ago

Hi @krakazyabra,

1) Postgres I don't use postgres, so that would mean I'dd need to set up a complete test environment for it.. also I must admit I don't have any experience wit Postgres.. so that would become a little challenge ;)

2) "dry run" Only 'test=0' will make changes to your running nextcloud! Only at 'test=0' a real migration will be performed and your Nextcloud will be changed to actually switch between one main storage to the other. All other steps perform tests and copy data to the new location. At "lower test levels" some checks may perform database actions as it will fix faulty entries in the database/storage (<= update, checked it, it's won't change/delete anything in your database only at 'test=0', just like the instruction says ;) So in short, follow the instruction and start at the "highest test level"

3) keep data in S3 Well, it actually does that by default. I don't want to delete anything.. that's up to the owner (check the closing words of the part of the instruction ;) When one would use the script S3->local all (not faulty) data remains in S3.. so when you would run local, but decide to go back to S3 (and didn't remove any data) you could return quite easily using the local->S3 script.. Offcourse starting out (again) at a high testlevel and working your way 'down to 0'.. that would go quite fast (step 6 would be completed rather fast, since most of the data would already be there)

Each time you run the script it backs up your SQL and config.. all to make the risk of permanently 'bricking' your setup as small as possible.. but as always.. be careful and check after every step you take ;)

mrAceT commented 1 year ago

I use the script localtos3 every now and then to perform what I call a "sanity check", this is what I get:

 #sudo -u clouduser php81 -d memory_limit=1024M /[folder]/localtos3.php

#########################################################################################
 Migration tool for Nextcloud local to S3 version 0.33

 Reading config...

#########################################################################################
Setting up local migration to S3 (sync)...

first load the nextcloud config...
S3 config found in $PATH_NEXTCLOUD system config.php => $CONFIG_OBJECTSTORE not used! (/[folder]/storage.config.php)
connect to sql-database...
WARNING: no 'local::/[folder]/data' found, therefor no sync local data > S3!

The warning is logical since I'm already at S3 and only do the 'sanity checking'

FOUND 'object::store:amazon::[S3account]', OK
The object store id is:1

WARNING: if this is for a full migration remove all data with `storage` = 1 in your `oc_filecache` !!!!

I've seen someone having problems migrating who messed up the storage ID's so I'm just warning about that ;)

######################################################################################### 0

NOTE: THIS IS THE REAL THING!!

Base init complete, continue?

it waits for user input (like an enter) before continuing..

#########################################################################################
database backup...mysqldump: [Warning] Using a password on the command line interface can be insecure.

(to restore: mysql -u [user] -p [dbname] < backup.sql)

backup config.php...not needed
connect to S3...

#########################################################################################
Setting everything up finished ##########################################################

#########################################################################################
appdata preview size... ($PREVIEW_MAX_AGE = 0 days, stats only)
appdata preview size        :  123.45 Mb        (7654 files)
appdata preview > 1 year old:    0.00 Mb        (0 files)

I read a lot about people complaining that preview images weren't deleted.. I haven't had one preview, ever, older then 1 year..

#########################################################################################
read files in S3.......................

the more files, the more dots.. ;)

Objects to process in S3: 56789  DONE
objects removed from  S3: 0     (0 bytes)
objects updated to    S3: 0     (0 bytes)
objects skipped on    S3: 0     (0 bytes)
objects in sync on    S3: 56789 (98.76 Gb)

#########################################################################################
check files in oc_filecache...
Number of objects in oc_filecache: 56789 DONE
Files in oc_filecache added to S3: 0    (0 bytes)
Copying files finished

#########################################################################################
check for canceled uploads in oc_filecache...
=> EXPERIMENTAL, I have not had this problem, so can not test.. => check only!

As the note says.. I have not had this problem, so can not test.. so check only ;)

#########################################################################################
NOTE: you can remove the user folder of files_external  by: rm -rf /[folder]/data/files_external

#

At the end it'll tell you which folders you can remove.. but it's up to you to do that or not ;)

krakazyabra commented 1 year ago

Got it, thanks! I can have broken files in S3 (due to several outages). How script can by-pass them? Will it fail, or just make a report? Also, can I specify amount of threads? I gave >2Pb of data, so it can take ages for 1 thread :)

mrAceT commented 1 year ago

I "only" have a 100+Gb of data.. it runs on a single thread.. when all goes well it actually goes quite fast.. The code already is quite extensive.. and I wanted to keep the code as simple asp possible.. no extra modules.. etc.. This is all about user data.. I wanted the code to be as readable/check-able as possible..

In test-mode it'll only warn for every broken file in S3/local and go to the next.. ONLY in 'live mode' (test=0) it'll remove the broken database entries and orphaned S3 files!

The thing is data copying.. with that "whopping 2Pb of data" getting back from your S3 bucket .. geez.. that'll take a while :P

The great part though.. you can download per user.. and when it hangs/fails somewhere you can simply restart it.. and it'll check and simply continue where it stranded (with that amount of data.. chances are there will be a hickup somewhere sometime ;)

When I migrated from local to S3 with my 100+Gb of data I started it in the night, canceled it in the morning (since it does "clog the channels.. even with one thread..) and I didn't want my users to experience any lagging.. and restarted it the next evening/night.. it'll get there.. ;)

krakazyabra commented 1 year ago

got it, thanks. I have multibucket config, bucket-per-user.

you can download per user unfortunately nextcloud doesn't support 2 default primary storages (f.e. local and S3 simultaneously).

Anyway, I will try to edit this script for postgres and try it.

mrAceT commented 1 year ago

got it, thanks. I have multibucket config, bucket-per-user.

So you don't have Nextcloud set up with S3 as primary ? Or you actually have an S3 bucket per user with S3 as primary?

I didn't know that was possible?!

One can add an S3 account to every user.. even with local as primary.. is that the way you work?

you can download per user unfortunately nextcloud doesn't support 2 default primary storages (f.e. local and S3 simultaneously).

Well.. not by design.. but you could "hack the system".. but I think that would be waiting for an accident to happen..

Anyway, I will try to edit this script for postgres and try it.

I tried to make the queries and the structure as readable and "simple" as possible.. if you have questions, shoot!

krakazyabra commented 1 year ago

So you don't have Nextcloud set up with S3 as primary ?

S3 is primary, but I'm using undocumented class objectstore_multibucket:

  array (
    'class' => '\\OC\\Files\\ObjectStore\\S3',
    'arguments' =>
    array (
      'num_buckets' => 1500000,
      'bucket' => 'nextcloud-',
      'autocreate' => true,
      'key' => 'admin',
      'secret' => 'secret',
      'use_ssl' => true,
      'hostname' => 'awesome-url',
      'port' => 9000,
      'use_path_style' => true,
    ),
  ),

I have Minio S3 in local infrastructure.

One can add an S3 account to every user.. even with local as primary.. is that the way you work?

well, that's hard to explain S3 principles, but I have one big scope, where admin user is nextcloud (check in config key and secret). this user is allowed to create buckets. Each nextcloud's user has his own bucket.

mrAceT commented 1 year ago

S3 is primary, but I'm using undocumented class objectstore_multibucket:

Needed to reed into that one a bit ;) You use this because of your large amount of data? I read that 'multi bucket' may be needed for some S3 providers because of limitations..

Of what I read, is this still an issue nowadays?

[update] I use 'OVH' and I checked there, it says: Maximum number of objects in a bucket: Unlimited

So I think "I'm good" (?)

krakazyabra commented 1 year ago

Needed to reed into that one a bit ;)

https://github.com/nextcloud/server/wiki/How-to-test-S3-primary-storage

You use this because of your large amount of data?

Because we can + by design. As I said, we're using own local S3 storage (min.io). you can distribute data between nodes more flexible. plus it will be easier to escalate data of customer.

mrAceT commented 1 year ago

A day nothing learned, is a day lost, learned something today ;)

In any case, this script does not support multiple primary S3 storages.. so you will need to "hack the script". If you have questions, shoot!

PS: to get this working I created a small nextcloud installation with my "live config" and started digging in the database entries to figure out "where is what" and started testing with my migration script.. I think that is the best way to go.. I didn't want to start "hacking around" with 100+Gb of "real data of real users".. I think that is the best route too with 2Pb of data.. ;)

krakazyabra commented 1 year ago

If you have any questions you can find me in telegram @krakazyabra

mrAceT commented 1 year ago

AD: I took a look at my S3->local script.. With you having multiple S3 buckets.. I think you probably need to comment a number of warnings that won't apply to you

PS: sent you a message on telegram