sillsdev / web-languageforge

Language Forge: Online Collaborative Dictionary Building on the Web and Phone.
https://languageforge.org
MIT License
44 stars 29 forks source link

Node script to copy projects from staging or prod #1816

Closed rmunn closed 4 weeks ago

rmunn commented 1 month ago

Fixes #1542

Description

Bash script (on Windows, you'll want to run it with gitbash) to copy projects from staging or production to your local Docker dev environment.

Usage: First, edit the script and make sure the staging_context and prod_context values match the names you've given to your Kubernetes contexts. (If unsure, run kubectl config get-contexts to see what context names you have on your system).

Then run node backup.mjs MongoID, e.g. to copy https://staging.languageforge.org/app/lexicon/5dbf805650b51914727e06c4, you'd copy the Mongo ID out of that URL and run backup.sh 5dbf805650b51914727e06c4.

Alternately, you can just paste the URL as the command-line argument, at which point the script will automatically extract the project ID. Be careful to quote the URL, as some characters might have special meaning to the shell. (For example, on Linux, the ! character means "Find a previous command that starts with this text". If you don't put quotes around the URL, you'll get an error from Bash saying bash: !/editor/entry/5dbf806cbea602641cc27e61?sortBy=Default: event not found. And they must be single quotes, because double-quotes don't remove the special meaning of !).

Dependencies

You need kubectl and docker installed. Also, the script first tries to copy assets using kubectl cp, but if that fails, it falls back to rsync. On Windows, you might need to install a Windows build of rsync from the msys project.

Checklist

Testing

Testers, use the following instructions against our staging environment. Post your findings as a comment and include any meaningful screenshots, etc.

Describe how to verify your changes and provide any necessary test data.

github-actions[bot] commented 1 month ago

Unit Test Results

362 tests   362 :white_check_mark:  13s :stopwatch:  37 suites    0 :zzz:   1 files      0 :x:

Results for commit 1e3ced21.

:recycle: This comment has been updated with latest results.

rmunn commented 1 month ago

In our discussion from https://github.com/sillsdev/web-languageforge/issues/1542#issuecomment-2121843964 we talked about modifying all the userRef values from the lexicon (entries, comments, and so on). The script does not do that yet. So far I haven't encountered errors due to userRef values pointing to non-existent users, but I haven't tested that extensively. That might end up being unnecessary, but more testing is needed to prove that.

hahn-kev commented 1 month ago

I'm wondering if it would make more sense to write a script like this in JS which has a pretty high confidence of running cross platform, as it is Chris has stopped working on Mac so the majority of the team will be running this on windows.

As it is bash scripts are difficult to maintain since most of us don't write bash unless we need to. Additionally with JS we could just use kubectl to open a port to the db and use a mongo connection directly which would make it much simpler to write than a line like this admin_id=$(docker exec lf-db mongosh -u admin -p pass --authenticationDatabase admin scriptureforge --eval "db.users.findOne({username: 'admin'}, {_id: 1})" | cut -d"'" -f 2) thoughts?

megahirt commented 1 month ago

@hahn-kev you bring up some good points. I didn't think of writing this in NodeJS. I had been planning on writing it in PHP and making something that ran server side. Since I hadn't done it yet I asked Robin to. We both agreed that avoiding PHP would be simpler, since the approach was primarily shelling out to mongo commands. Once we realized we could run the entire thing remotely and not involve the server, it seemed natural to write it in bash. I plan to ensure it runs on Windows.

If we could port it to JS for free now that would be cool, but as long as it works as advertised, I am fine leaving this as bash. If we have issues with cross platform or maintainability friction because not everyone writes bash, then I'd go with a three-strikes-and-we-port-it philosophy.

megahirt commented 1 month ago

@rmunn I really wanted to test this however my windows machine doesn't have kube contexts or a wireguard tunnel setup :( I need my existing tunnel and contexts which are on my mac at home. So maybe tomorrow I will test on windows.

rmunn commented 1 month ago

I'm willing to rewrite in JS if necessary, though I'd like to see how it performs in a Git Bash environment first, to avoid unnecessary work if it does turn out to be unnecessary.

megahirt commented 1 month ago

I'm willing to rewrite in JS if necessary, though I'd like to see how it performs in a Git Bash environment first, to avoid unnecessary work if it does turn out to be unnecessary.

I asked chatgpt to port it over to Node Typescript and I thought it did a good first pass: https://chat.team-gpt.com/lt-lexical-tools/664585252bf6048a1b9a3f67

rmunn commented 1 month ago

It seems kubectl exec is just not reliable, so the mongodump/mongorestore approach to copying the project database is not going to work. I just added a loop to the Node.JS script to keep running mongodump/mongorestore until it succeeds, and my most recent run has now been going for over two hours without mongodump succeeding one single time. And that's on a project with just 30 entries, which takes just a few seconds to mongodump when it succeeds.

Edit: Switching to a different Internet connection made no difference. kubectl exec connections still got torn down so fast I couldn't rely on them.

I'm going to rewrite the mongodump/mongorestore step to use a MongoClient connection instead. Since I can't use the .copyDatabase() feature that Mongo removed in version 4.2, I'll list the collections on the remote database, grab all the data from one collection at a time, and on the local database, drop the existing collection before doing an insertMany or bulkWrite operation to load the data.

rmunn commented 1 month ago

NOTE: If you get an error like tar: ./audio: File removed before we read it, then it means you copied a project where one of the two directories (audio and pictures) was a broken symlink. This isn't likely to happen often, so I'm not adding mitigation to the script. The answer is simply to go into deploy/app, delete the broken symlink, and replace it with an empty directory.

If this happens a lot, I'll open a separate issue to track that bugfix.

UPDATE: Nope, this is happening a lot; it's apparently quite common. I'll include a mitigation in this PR rather than a separate issue.

rmunn commented 1 month ago

Might have something to do with windows. Changing the occurences of echo -n to use printf (e.g.) printf "no" seems to fix it for me.

Grrr. Windows, what are you doing? That echo was clearly inside quotes, it should not have been processed by your shell! It was supposed to be part of the sh -c input.

It's those little subtle differences that get you when trying to write cross-platform scripts. Another thing I could do here is give up on the -n option and instead strip newlines from the command output before comparing it to "yes" or "no". (The -n option to echo means "no newline"; normally echo will automatically add a newline after the text you give it). I think I'll do that, as -n might be a little too much magic for Windows.

hahn-kev commented 1 month ago

I want to say windows typically uses double quote's (") instead of single to do something like that.

~I'll try and take a look at this today and get it working on windows~ sadly LF is not working locally for me right now so I can't try this out.

rmunn commented 1 month ago

I want to say windows typically uses double quote's (") instead of single to do something like that.

That makes things difficult, because Linux assigns different meaning to single quotes vs double quotes; for example, single quotes don't do $variable expansion. There are other differences too, so I use single-quotes routinely when doing sh -c 'some long command' because that works far more often. Plus it allows me to put double-quotes inside the command.

I might be able to rewrite the exec calls to use the version of exec where you pass each parameter as a separate string and let Node take care of appropriate quoting on each OS.

megahirt commented 1 month ago

That makes things difficult, because Linux assigns different meaning to single quotes vs double quotes;

Yup, that's true. Are you saying you cannot make it work with double quotes?

`kubectl --context=${context} --namespace=languageforge exec -c app deploy/app -- sh -c "readlink -eq ${name} >/dev/null && echo yes || echo no"`,

Does it work on Linux like this?

rmunn commented 1 month ago
`kubectl --context=${context} --namespace=languageforge exec -c app deploy/app -- sh -c "readlink -eq ${name} >/dev/null && echo yes || echo no"`,

Does it work on Linux like this?

Just pushed a commit making it work with double quotes, which I started even before your comment. :-) I would have pushed it an hour ago, but DockerHub was giving me just a trickle of bandwidth so it took nearly an hour to download the Docker images and run local LF to test it.

rmunn commented 1 month ago

Just found a bug: if the lexicon collection exists but is empty, you get MongoInvalidArgumentError: Invalid BulkOperation, Batch cannot be empty.

I'll fix it, but if it takes too long then I won't spend too much time on it — because a project with no lexical entries at all is not one that we're likely to need to copy to local LF in order to troubleshoot. :-)

rmunn commented 1 month ago

The "unexpected EOF" error is because kubectl cp is unreliable, and will continue to be unreliable until the server is running Kubernetes 1.30 or later and you have a kubectl with version 1.30 or later. That's precisely why I designed the script to fall back to rsync.

The bit about deploy/app not being available to your user account is fixable: I already have the pod name from the earlier kubectl cp step, so I should use use the pod name instead of deploy/app. So I'll fix that first.

If it's working now in Linux, then maybe you should just merge it. 🤷

Okay, I'll dismiss your "changes requested" review from earlier so that GitHub will allow me to merge this.

rmunn commented 1 month ago

You might be able to work around the "drive letter interpreted as a pod name" error by using the obscure \\localhost\c$\my_dir format for paths. https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#unc-paths

rmunn commented 1 month ago

Feature request from meeting: we want to auto-cleanup the tar file from the server on script exit, so we don't leave a bunch of asset tarballs lying around until the next container restart.

rmunn commented 1 month ago

@myieye -

As we discussed, leaving this bit of the work for you. Commit https://github.com/sillsdev/web-languageforge/pull/1816/commits/45d82949f933077d72ce0bec8e2d61200334fd76 adds a comment in the place where you'd want to make that substitution.

You might be able to work around the "drive letter interpreted as a pod name" error by using the obscure \\localhost\c$\my_dir format for paths. https://learn.microsoft.com/en-us/dotnet/standard/io/file-path-formats#unc-paths