triplea-game / triplea

TripleA is a turn based strategy game and board game engine, similar to Axis & Allies or Risk.
https://triplea-game.org/
GNU General Public License v3.0
1.35k stars 399 forks source link

Infrastructure as Code - Ansible control for Forum #6599

Open DanVanAtta opened 4 years ago

DanVanAtta commented 4 years ago

It might be time for us to put forum software into ansible so that the configuration can be documented.

Motivating reasons:

  1. I don't know how the forum is deployed, what components are part of it, if there is a NGINX fronting it or not. If I do not know this, others will not either.
  2. Upgrade process is opaque, maintenance operations just happen without notification or documentation of what is changed when.
  3. No disaster recovery beyond restoring from linode
  4. Infrastructure as code is a philosophy that says nobody should touch the running servers. You modify servers by modifying the checked in code and that manages your servers. This keeps a running history and audit trail of configuration and ensures documentation and transparency.

@RoiEXLab would you be comfortable to start developing an ansible role that could set up a forum server? To start being able to control the nodeBB version and/or any NGINX server in front of the server would be excellent. Later on the list would be to be able to maintain and automate any database tasks.

RoiEXLab commented 4 years ago

It's actually a little tougher than one might think. I'm not happy with the current state and I think a deployment redesign would be a far better choice actually.

The main problem I encounter every time when updating (which is also the reason I do it so rarely) comes down to the permission system. Basically the basic idea is that every file related to nodebb is owned by the nodebb user which is a "virtual user", without home directory. Now the problem with that is the npm utility. See the basic update procedure is as follows (in theory):

  1. Stop the NodeBB server if it's running (nginx should stay alive in order to display a nicer error page)
  2. Checkout the most recent release branch (they create a new branch for every minor release in semver terms, something like v1.13.x), or simply pull if it's just a patch version change.
  3. Run ./nodebb upgrade (I think at some point you need to confirm that you want to update the installed plugins as well), which is basically a superset of ./gradlew build which forces npm to download the latest dependencies and updates the database accordingly as well.
  4. Start NodeBB again (we manage it via systemd btw, so sudo systemctl start nodebb)

Now the Problems occur in step 3. If I run the upgrade command as my own user it will fail because my own user doesn't have any permissions inside the nodebb owned folders, also any created files would belong to my user instead of nodebb. Using sudo doesn't help unfortuntely because nodebb doesn't run it's build scripts (to build native binaries for js libraries that require them) with root privileges (for a good reason), so they still fail. When I run the upgrade as the nodebb user using sudo -u nodebb npm complains that /home/nodebb/.npm doesn't exist because it would like to store its dependencies there (similar to how gradle and maven do it). I know that there would be a way to change this path to another folder that actually exists, but haven't actually tried this so far. So the way I do it which works but is super ugly is that I first change ownership of all files to my own user account, then run the command as my account, so it picks up my own .npm folder in my home directory and afterwards change the permissions back to the nodebb user. As you can probably imagine I'm not quite happy with this, but whenever the forum is down because of some random issue I'm having I just try to get it back online ASAP, there's not too much time for experiments there. Dockerizing the main NodeBB server could be an option which would allow to make a more or less reproducable ecosystem if we just mount the "important files" like uploaded content and config files to real locations on the hard drive. Not sure how installed plugins are actually tracked, it has to be some mix between database and package.json, plugins are basically npm dependencies that work with NodeBB. MongoDB (the database NodeBB uses in our case) could continue to run as-is, it's currently controlled via systemd as well, but I haven't had any problems with it since the start of the forum, the same goes for nginx, wich uses letsencrypt for its automated certificates.

To conclude: I think this is an important change to make, however I'm not quite sure if automated upgrades ca reliably work without human observation, I had too many (mostly minor) problems in the past to be confident enough. See https://forums.triplea-game.org/topic/1172 and https://forums.triplea-game.org/topic/1965 for some examples of incidents that occured because the native code wasn't building correctly. It could be that I just was messing up the permissions, but in any case I'm not in a mood to blindly trust it.

So before thinking about creating any role we need to come up with a reliable upgrade mechanism first. If that works in the future without problem (and is also able to upgrade nodejs if necessary) we can and should go for automation.

I hope this cleared things up a bit

RoiEXLab commented 4 years ago

Also have a look at https://docs.nodebb.org/configuring/upgrade/ https://docs.nodebb.org/installing/os/ubuntu/ https://docs.nodebb.org/configuring/config/

for some insight into a couple of details I might have missed

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. If there is something that can be done to resolve this issue, please add a comment indicating what that would be and this issue will be re-opened. If there are multiple items that can be completed independently, we encourage you to use the "reference in new issue" option next to any outstanding comment so that we may divide and conquer.

RoiEXLab commented 4 years ago

Small update: There was a small incident with the forum today. For some reason the OS killed mongodb (SIGKILL) which caused the forum to enter an infinite startup loop. I changed the systemd config to auto-restart the DB in case this happens again, but I think sooner or later we're going to run out of RAM and/or disk space, so we might want to consider upgrading the machine at some point. (Storage is 40% free right now, but RAM is almost 90% usage all the time)

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. If there is something that can be done to resolve this issue, please add a comment indicating what that would be and this issue will be re-opened. If there are multiple items that can be completed independently, we encourage you to use the "reference in new issue" option next to any outstanding comment so that we may divide and conquer.

DanVanAtta commented 4 years ago

@RoiEXLab would deploying nodebb from within a docker container help us at all? Thoughts?

With maps server needing a login API, this would be a good opportunity to see if we can use forums for user authentication.

DanVanAtta commented 4 years ago

The forum would essentially become a single-sign-on (SSO), something long desired. If we do that, a docker for local development would be a necessity. Given the headaches of deploying, that might smooth things out a bit and give a way to at least test locally before we update production. If that is the case then we probably only need to figure out how to migrate the datastore files and we could do a larger server.

Once dockerized, we maybe resolve some of the deployment troubles (we could deploy the container via ansible quite readily), we'd enable local development, and then it looks like there are some plugins to enable an authentication API and user registration:

RoiEXLab commented 4 years ago

@DanVanAtta dockerizing could definitely help making the process more predictable. I'm not 100% familiar with the layers of docker, but if done correctly I think images could be split up so that dependencies like nodejs or mongodb could be upgraded without potentially breaking anything.

Using docker images we could also easily simulate the process of upgrading on a dev instance before actually going for the upgrade which could help solve any potential problems before they even occur.

DanVanAtta commented 4 years ago

I wouldn't underestimate vagrant for being able to test deployments and other infrastructure components. At the same time though if we do SSO, we'll need a node-bb docker, I'm thinking it could be just as well to deploy that container.

AFAIK nodejs is what runs nodebb, right? Seems like it would be baked into the image perhaps. MongoDB on the other hand could be its own container (and would probably need to be for local development as well, which then invites us to use docker-compose as we'll be managing multiple docker instances to get a local dev environment set up).

RoiEXLab commented 4 years ago

AFAIK nodejs is what runs nodebb, right?

Yes!

which then invites us to use docker-compose

Exactly what I was thinking. I think a MongoDB based image and a NodeJS image composed would work together very nicely

DanVanAtta commented 4 years ago

Upgrading nodebb required a number of commands. For the record, to get to 1.4.3, ran:

sudo npm install husky --save-dev
sudo npm audit fix
sudo ./nodebb upgrade
sudo ./nodebb setup

sudo ./nodebb audit
sudo ./nodebb audit fix --force

sudo ./nodebb setup
sudo ./nodebb upgrade

sudo npm install --save-dev mocha@8.2.0 
sudo npm install nodebb-plugin-dbsearch@4.1.2
sudo npm install --save-dev @commitlint/cli@11.0.0
sudo npm install nodebb-plugin-emoji@2.0.0 eslint@6.8.0 textcomplete@0.14.2
sudo npm upgrade

sudo git checkout v1.14.3
sudo ./nodebb setup
sudo ./nodebb upgrade
sudo npm audit fix
sudo ./nodebb upgrade

The above also fixed a few security vulnerabilities as well.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If there is something that can be done to resolve this issue, please add a comment indicating what that would be and this issue will be re-opened. If there are multiple items that can be completed independently, we encourage you to use the "reference in new issue" option next to any outstanding comment so that we may divide and conquer.