realm / realm-object-server

Tracking of issues related to the Realm Object Server and other general issues not related to the specific SDK's
https://realm.io
293 stars 42 forks source link

Backup not working or misleading documentation #406

Closed programaths closed 5 years ago

programaths commented 5 years ago

Goals

Backup on one server, restore on another using information found there : https://docs.realm.io/platform/self-hosted/manage/enterprise-architecture/backup

Expected Results

The new realm start and restore previously bacluped data.

Actual Results

Error with this output:

ros-server_1     | 2018-12-13T14:16:16.788695927Z debug: [sync] Attempting to create user data directory at `/realm-object-server/data/sync/user_data'.
ros-server_1     | 2018-12-13T14:16:16.788731745Z debug: [sync] Directory `/realm-object-server/data/sync/user_data' already exists, continuing.
ros-server_1     | 2018-12-13T14:16:19.349859581Z info: [sync] Migration required
ros-server_1     | 2018-12-13T14:16:19.349891566Z info: [sync] Found 70886 Realm files in /realm-object-server/data/sync/user_data
ros-server_1     | 2018-12-13T14:16:19.393866429Z Error starting Realm Object Server: make_dir() failed: No such file or directory
ros-server_1     | 2018-12-13T14:16:19.393898019Z fatal: [sync] Encountered an error starting up: realm::util::File::AccessError: make_dir() failed: No such file or directory

Steps to Reproduce

NOTE: this uses docker NOTE: the backup-ed realm is 45GB

Backup a ROS:

BACKUP_NAME=backup-$(date --iso-8601)
docker-compose exec ros-server /realm-object-server/node_modules/.bin/ros backup -f /realm-object-server/data -t /realm-object-server/$BACKUP_NAME
docker-compose exec ros-server tar zcf /$BACKUP_NAME.tar.gz -C /realm-object-server/$BACKUP_NAME .
docker cp christian_ros-server_1:/$BACKUP_NAME.tar.gz $BACKUP_NAME.tar.gz
aws --profile ros-backup s3 cp $BACKUP_NAME.tar.gz s3://ros-backups

Attempt to restore on a new machine

BACKUP_NAME=backup-$(date --iso-8601) <1>
aws --profile ros-backup s3 cp s3://ros-backups/$BACKUP_NAME.tar.gz $BACKUP_NAME.tar.gz
docker run \
--mount source=christian_ros-server-data,target=/ros-data \
--mount type=bind,source=/home/christian/$BACKUP_NAME.tar.gz,destination=/backup.tar.gz,readonly \
--rm -it node:10-slim bash -c 'rm -rf /ros-data/* && tar -xzf /backup.tar.gz  --strip-components 1 -C /ros-data/'

In a nutshell (TL;DR)

It's merely a backup followed by a restore !

Version of Realm and Tooling

fealebenpae commented 5 years ago

Hi @programaths,

Could you please run tar --gzip --list --verbose --file=$BACKUP_NAME.tar.gz and attach the output here?

programaths commented 5 years ago

./sync/user_data/company/e7b197c7-6811-478d-a76f-359bb918821b/partial/3c1421b69709a16b2d398b2a3d08125c/d182d44f9709aa451fa3c6007dc30add6696747f.realm
./sync/user_data/company/e7b197c7-6811-478d-a76f-359bb918821b.realm
./sync/user_data/company/2a0ac87c-e1a8-4912-9c0d-2748a4aa9e46/
./sync/user_data/company/2a0ac87c-e1a8-4912-9c0d-2748a4aa9e46/__partial/
./sync/user_data/company/2a0ac87c-e1a8-4912-9c0d-2748a4aa9e46/
partial/3c1421b69709a16b2d398b2a3d08125c/
./sync/user_data/company/2a0ac87c-e1a8-4912-9c0d-2748a4aa9e46/partial/3c1421b69709a16b2d398b2a3d08125c/d182d44f9709aa451fa3c6007dc30add6696747f.realm
./sync/user_data/company/f9f7b11b-1021-469a-81ec-b6f992da0bdc.realm
./sync/user_data/company/21775152-d838-4c44-9b51-fc49ec726b68/
./sync/user_data/company/21775152-d838-4c44-9b51-fc49ec726b68/__partial/
./sync/user_data/company/21775152-d838-4c44-9b51-fc49ec726b68/
partial/3c1421b69709a16b2d398b2a3d08125c/
./sync/user_data/company/21775152-d838-4c44-9b51-fc49ec726b68/partial/3c1421b69709a16b2d398b2a3d08125c/d182d44f9709aa451fa3c6007dc30add6696747f.realm
./sync/user_data/company/f9f7b11b-1021-469a-81ec-b6f992da0bdc/
./sync/user_data/company/f9f7b11b-1021-469a-81ec-b6f992da0bdc/
partial/
./sync/user_data/company/f9f7b11b-1021-469a-81ec-b6f992da0bdc/partial/3c1421b69709a16b2d398b2a3d08125c/
./sync/user_data/company/f9f7b11b-1021-469a-81ec-b6f992da0bdc/__partial/3c1421b69709a16b2d398b2a3d08125c/d182d44f9709aa451fa3c6007dc30add6696747f.realm
./sync/user_data/company/eb47eb65-e6e7-4043-8279-533506251a7f.realm
./sync/user_data/company/eb47eb65-e6e7-4043-8279-533506251a7f/
./sync/user_data/company/eb47eb65-e6e7-4043-8279-533506251a7f/
partial/
./sync/user_data/company/eb47eb65-e6e7-4043-8279-533506251a7f/partial/3c1421b69709a16b2d398b2a3d08125c/
./sync/user_data/company/eb47eb65-e6e7-4043-8279-533506251a7f/__partial/3c1421b69709a16b2d398b2a3d08125c/d182d44f9709aa451fa3c6007dc30add6696747f.realm
./sync/user_data/company/21775152-d838-4c44-9b51-fc49ec726b68.realm
./sync/user_data/global/
./sync/user_data/global/global-data.realm
./sync/user_data/global/global-constants.realm
./sync/user_data/
configuration.realm
./keys/
./keys/auth.key
./keys/admin.json
./keys/auth.pub

Also, you can see the whole process executed here: http://static.showsourcing.com/realm-support-files/ros-backup/index.html

fealebenpae commented 5 years ago

The contents of the archive seem right, but I can't quite decipher your command to extract the archive because of all the Docker and volume noise - can you confirm that you're extracting the archive inside the restored server's data folder (that is to say, the contents of realm-object-server/data match the contents of the archive you paster here) and that the user the server's running as has read/write permissions of its data folder and everything in it?

programaths commented 5 years ago

Hi, I can confirm both. I also tested locally and it seems that the bug happens when we delete the "realm" folder.

Bellow a copy of the mail I sent to support.

I launched locally.

Log when removing the "realm" folder: https://pastebin.com/RJPVFr4H

For the "do not remove realm folder case", I did a clean server and started it so it creates files, then I override them with the backup content. The when I try to start the server the few first time, I get an error: fatal: BasicServer didn't come up in 30000ms. Bailing. With the complete log here : https://pastebin.com/BFkxZVGF

To be certain this is not due to checks, I put this config in place: verifyRealmsAtStart: false, skipVerifyRealmsAtStart: true, After a few consecutive launch (4 to 6), the error disappear, server is up and data restored. We can also connect. I also changer the "startupTimeout" with 30000, 120000 and 6000. Each time I had to launch it more than 3 times to get it up and running.

Also, there was very little data in this test: christian@christian-Aspire-TC-605:~/Documents/projects/showsourcing-infra/realm-object-server$ ./node_modules/.bin/ros backup -f data -t ros-backup info: The data directory /home/christian/Documents/projects/showsourcing-infra/realm-object-server/data was backed up in the directory /home/christian/Documents/projects/showsour cing-infra/realm-object-server/ros-backup. 30 Realms were successfully backed up

fealebenpae commented 5 years ago

Now that you mention removing the realms folder I think I suspect what the issue can be - removing it causes the make_dir error but keeping it causes the "divergent histories" error in the logs.

Instead of removing it, could you please try running find . -type f -name "*.realm*" -exec rm -f {} \; from inside the realms folder before you restore from backup? This will preserve the folder structure, but get rid of the realm files. My suspicion is that making the folder hierarchies fails because the data folder is actually the root of a volume you mount and if the server works after restoring with this command then that'll be pretty much confirmed.

programaths commented 5 years ago

That was on a normal computer (no docker), without any mount. Exeuted "locally" to avoid adding the "docker" complexity.

Also, I am not at work, so I can't easily do what you said right now.

programaths commented 5 years ago

I executed this on my local machine and it seems to work. Here are the steps:

  1. Launch ROS server
  2. Feed it with data via a client
  3. Do a backup with ros backup -f data -t ros-backup
  4. Wipe the data directory
  5. Launch ROS to create a fresh data directory
  6. Shut down ROS
  7. From the data/realms directory execute find . -type f -name "*.realm*" -exec rm -f | xargs -I file rm -f file
  8. cp -r ros-backup/* data/
  9. Start ROS server
  10. Connect and check restoration

Though, it's not that optimal as a procedure. Also, the command at stage 7 preserve named pipes (not sure what it entails).

Very important note

All of this was done locally to avoid Docker problems as was https://github.com/realm/realm-object-server/issues/406#issuecomment-448623264 So, docker was out of the equation. Concerning mounts, there is no mounts involved. (All on local disk, in a sub directory of my $HOME)

fealebenpae commented 5 years ago

Alright, thank you - this narrows down where the problem is.

programaths commented 5 years ago

Came back to retry restore and noticed I did a mistake in the steps: (Corrected step 7)

  1. Launch ROS server
  2. Feed it with data via a client
  3. Do a backup with ros backup -f data -t ros-backup
  4. Wipe the data directory
  5. Launch ROS to create a fresh data directory
  6. Shut down ROS
  7. From the data/realms directory execute find . -type f -name ".realm" | xargs -I file rm -f file
  8. cp -r ros-backup/* data/
  9. Start ROS server
  10. Connect and check restoration
morten-krogh commented 5 years ago

Hi @programaths

The server fails in a migration step. However, I suspect it has to do with permissions. That would be our first hypothesis at the moment.

Can I ask you to make a few small experiments.

  1. comparing the case that works with the case that doesn't. Can you look closely at directory ownership and permissions for both the top level directory and the directories one level down.

  2. Try starting the server as root or with sudo. There is no danger in this experiment. The server will not do any harm t your system under any circumstances. The server should generally not run as root, but if it works as root, it would confirm that the problem has to do with permissions.

programaths commented 5 years ago

Hi, I am not working with the company and realm anymore.

I will not be able to provide feedback, neither willing to do so anymore.

morten-krogh commented 5 years ago

@programaths No problem. Sorry about the inconvenience.