nodejs / build

Better build and test infra for Node.
503 stars 165 forks source link

Equinix Move: Backup Server #3749

Closed ryanaslett closed 2 months ago

ryanaslett commented 3 months ago

sub issue of #3597

Mnx has created the packages that allow us to have a large disk instance for our backup server.

I've created a new home for backups at mnx, but I currently lack the 'infra' privileges to be able to provision it properly.

mhdawson commented 3 months ago

@ryanaslett I confirmed with @richardlau that you have infra level access, let us know if there is anything else needed to proceed.

ryanaslett commented 3 months ago

Progress report:

I have a question about the static data backups:

on nodejs-www, there are two subdirectories with nightly builds, and v8-canary builds. These are being pruned by a script for space saving purposes:

# - For anything over 2 calendar years ago, retain only the build dated first of the month
# - For anything in last 2 calendar years but not the last two months, retain date numbers ending in 1
# - Keep everything from the last two months.
#

All of the pruned nightly/daily canary v8 builds are on the backup server, going back 8 years.

The /backup directory on the current backup server is about 5.1 TB of data, of which 3.7TB of it is the nightly/v8-canary. The pruned versions of those directories on nodejs-www are reduced to 1TB.

So, Im wondering what the policy/intent is for keeping daily builds that far back. Its a tremendous amount of data, and now would be a good opportunity to not carry it forward if possible.

richardlau commented 3 months ago

So, Im wondering what the policy/intent is for keeping daily builds that far back. Its a tremendous amount of data, and now would be a good opportunity to not carry it forward if possible.

The "policy" was that we didn't delete anything. The pruning was introduced to manage the space on nodejs-www (which has less available space compared to backup) -- previous to introducing the pruning we just kept bumping to larger disks.

I know some collaborators have been asking about nightlies (cc @Uzlopak) but I'm not sure they need builds going all the way back to 8 years.

ryanaslett commented 2 months ago

Given that we wont have enough room on the new backup machine to house all of that data, and that the likelihood of that data needing urgent/immediate restore is presumably low, I propose that we stash that historical data temporarily on digital ocean's spaces (their S3 equivalent) until we can get some confirmation as to a retention policy, which I presume will take longer to establish than aligns with getting off of equinix.

mhdawson commented 2 months ago

Are you saying that we pruned what is served through www but on the backup server we never deleted what was pruned from the www server?

If so I think that we've never been asked to restore anything from the backup server in terms of the nightlies that having the backup server mirror what is available on www would make sense.

@nodejs/build any objections/concerns to that?

mhdawson commented 2 months ago

And to the specific suggestion of stashing the data temporarily somewhere else if needed until we agree +1 to that as @ryanaslett suggested.

ryanaslett commented 2 months ago

Are you saying that we pruned what is served through www but on the backup server we never deleted what was pruned from the www server?

Yes, exactly. The scripts appear to append new data to the backup server, but do not do a full synchronize to delete anything that no longer exists on the www server.

mhdawson commented 2 months ago

@ryanaslett thanks for confirming. Unless anybody objects I think the right answer is probably to only transfer the data which is on the www server.

ryanaslett commented 2 months ago

The new mnx.io backup server is online, and populated with everything that is on the old backup server, with the exception of the static daily builds that are trimmed by the prune.sh script that runs on nodejs-www.

Those files were sent from the old backup server to a pair of R2 buckets on cloudflare for the time being (until we get confirmation they can be deleted)

I'd like to get either confirmation or a +1 to now decomission the old backup server and return it to Equinix.

mhdawson commented 2 months ago

@ryanaslett how long has the back server been online, and are there any log files for the rsyncs that we can sniff test to see data being sync'd?

I trust it's correct but a few sniff checks here and there would be be good as well.

ryanaslett commented 2 months ago

@mhdawson It's been online and running parallel backups for a couple of weeks now. I offset the cron on it by 8 hours to not collide with the existing backup server process.

The static data that is synced over from nodejs-www appears to be keeping in sync: (nightly builds):

Backup Server:
root@infra-mnx-ubuntu2204-x64-1:/data/backup/static/dist/nodejs/nightly# ls -d1 v*|wc -l
558

Nodejs-www Server:
root@infra-digitalocean-ubuntu1604-x64-1:/home/dist/nodejs/nightly# ls -d1 v*|wc -l
558

The periodic weekly/monthly backups were synced from the old backup server to the new backup server, and the daily's were allowed to run.

Strangely, theres a monthly anomaly on each server:

New backup server seems to be missing the june monthly backup:

root@infra-mnx-ubuntu2204-x64-1:/data/backup/periodic# ls -la
total 84
drwxr-xr-x 21 root root 4096 Jul  9 08:11 .
drwxr-xr-x  5 root root 4096 Jun 17 18:29 ..
drwxr-xr-x  7 root root 4096 Jul  9 08:11 daily.0
drwxr-xr-x  7 root root 4096 Jul  8 08:04 daily.1
drwxr-xr-x  7 root root 4096 Jul  7 08:03 daily.2
drwxr-xr-x  7 root root 4096 Jul  6 08:01 daily.3
drwxr-xr-x  7 root root 4096 Jul  5 08:07 daily.4
drwxr-xr-x  7 root root 4096 Jul  4 08:07 daily.5
drwxr-xr-x  7 root root 4096 Jul  3 08:01 daily.6
drwxr-xr-x  7 root root 4096 May 18 00:16 monthly.0
drwxr-xr-x  7 root root 4096 Apr 27 00:20 monthly.1
drwxr-xr-x  7 root root 4096 Mar 31 00:00 monthly.2
drwxr-xr-x  7 root root 4096 Mar  3 00:05 monthly.3
drwxr-xr-x  7 root root 4096 Jan 27 23:57 monthly.4
drwxr-xr-x  7 root root 4096 Dec 31  2023 monthly.5
drwxr-xr-x  7 root root 4096 May  1  2016 monthly.6
drwxr-xr-x  7 root root 4096 Apr  3  2016 monthly.7
drwxr-xr-x  7 root root 4096 Jun 29 08:00 weekly.0
drwxr-xr-x  7 root root 4096 Jun 22 07:59 weekly.1
drwxr-xr-x  7 root root 4096 Jun 12 00:12 weekly.2
drwxr-xr-x  7 root root 4096 Jun  2 00:21 weekly.3

And the existing backup server seems to be missing the May backup:

[root@3a355104-c5d6-405f-863b-9ce5948ba77b /backup/periodic]# ls -la
total 381
drwxr-xr-x 21 root root 21 Jul  9 00:29 .
drwxr-xr-x  5 root root  5 Dec  9  2016 ..
drwxr-xr-x  7 root root  7 Jul  9 00:29 daily.0
drwxr-xr-x  7 root root  7 Jul  8 00:13 daily.1
drwxr-xr-x  7 root root  7 Jul  7 00:04 daily.2
drwxr-xr-x  7 root root  7 Jul  6 00:17 daily.3
drwxr-xr-x  7 root root  7 Jul  5 00:32 daily.4
drwxr-xr-x  7 root root  7 Jul  4 00:20 daily.5
drwxr-xr-x  7 root root  7 Jul  3 00:10 daily.6
drwxr-xr-x  7 root root  7 Jun  2 00:21 monthly.0
drwxr-xr-x  7 root root  7 Apr 27 00:20 monthly.1
drwxr-xr-x  7 root root  7 Mar 31 00:00 monthly.2
drwxr-xr-x  7 root root  7 Mar  3 00:05 monthly.3
drwxr-xr-x  7 root root  7 Jan 27 23:57 monthly.4
drwxr-xr-x  7 root root  7 Dec 31  2023 monthly.5
drwxr-xr-x  7 root root  7 May  1  2016 monthly.6
drwxr-xr-x  7 root root  7 Apr  3  2016 monthly.7
drwxr-xr-x  7 root root  7 Jun 30 00:01 weekly.0
drwxr-xr-x  7 root root  7 Jun 23 00:03 weekly.1
drwxr-xr-x  7 root root  7 Jun 16 00:04 weekly.2
drwxr-xr-x  7 root root  7 Jun  8 23:59 weekly.3

I believe that's due to the new backup server being one week behind the cycle and will eventually propagate.

mhdawson commented 2 months ago

@ryanaslett I believe there were also backups of Jenkins being put there as well. Do you have the list of things you set up to backup to the new server?

ryanaslett commented 2 months ago

@mhdawson The periodic data contains all of the jenkins ci and ci-release data.

Everything under the /backup folder on the equinix backup machine is being backed up onto the /data/backup folder on the new mnx.io backup server.

It includes

I have duplicated the cron scripts, and updated the backup scripts (https://github.com/nodejs/build/pull/3823) (hadnt yet created that PR)

I have made a backup of /root from the old server in a subfolder of root on the new server.

I didn't discover anything else on the old backup server outside of the /backup directory in either the scripts, documentation, or in traversing the filesystem.

mhdawson commented 2 months ago

@ryanaslett thanks for the details I'm +1 on letting the old backup server go. @nodejs/build anybody have any remaning concerns, if not a +1 to confirm would also be good.

ryanaslett commented 2 months ago

The backup server has been removed from ansible, and removed from the equinix account. Huzzah!