soni801 commented 3 months ago

Idea

Epochtal should natively support (and recommend) containerization as a deployment method. The most straightforward way to do this is to add support for docker (optionally also docker compose).

What should be provided?

Docker image(s) on a container registry for easy installation
An example docker-compose.yml file for quick setup

Functionality

All deployment specific variables should be provided through the process environment
~~The docker image should fully setup and run on first launch, given that all necessary environment variables are provided~~. Pancake implemented automatic setup on first launch in #42, but required binaries and stuff should be installed by docker.
The process should have sensible defaults, leaving as little as possible left for the user to specify
Many aspects of the application should still be optionally configurable via environment variables

Necessary code changes before building a docker image

~~Data storage should be separated from process directory and/or be optionally (but recommended) mountable through docker binds. This will be resolved by #29.~~ This was implemented by Pancake in #42.
Keys should be provided through environment variables
Weights and other more advanced configuration should probably somehow be provided through environment variables, ~~but I imagine this is gonna get out of hand pretty quickly. Maybe look into other alternatives (I don't like the aspect of needing files)~~. I like the environment example I provided under.

Other considerations

Many docker management platforms aren't designed to deploy with filesystem considerations. We should, as far as possible, avoid configuring stuff through files and rather go for more declarative approaches like environment variables.
Maybe we should support both environment variables and files for configuration. We could for example write a configuration controller that reads both the environment and the filesystem, and uses both for configuration, with a given priority. Alternatively, let the user specify the priority themselves through another environment variable?
I'd recommend building a GitHub action to build and push a docker image to the GitHub container registry at certain triggers. This makes it super easy for anyone to just pull the image instead of needing to build it themselves (also eliminates the need for any filesystem, allowing management to be easier!). I'd use releases as triggers for this action, which will let us push changes at very controllable intervals with change logs and all that good stuff.
We could also add support for dotenv (environment variable file) for people that like using the filesystem. Makes it really easy to configure everything in one readable file and let the docker container read that. In this case, we can also provide an example .env file in the repository.
The JWT and internal secrets should generate automatically if it is not provided as an environment variable
Should the docker container create its own SSL certificate or should the user provide one? Maybe the user can optionally provide one, otherwise it's gonna create its own?

Example

I'm thinking, in a perfect scenario, hosting epochtal should be as simple as pulling a docker image and running it, only providing minimal information to it. As an example, this is how I'd want the example compose configuration to be somewhat like:

services:
  epochtal:
    image: ghcr.io/p2r3/epochtal
    environment:
      STEAM_API_KEY: # insert your steam api key here
      DISCORD_API_KEY: # insert your discord api key here
      HOSTNAME: # this will be used for things like steam api return url
    volumes:
      - <host-path-to-epochtal-data>:/data # optional

soni801 commented 3 months ago

This is my idea for full envionment variable configuration (this can be provided as a .env.example file):

# Example environment variables file. Copy this file to ".env" in the working directory
# to use it with your deployment. Optional environment variables are commented out.
# DO NOT share this file with anyone after populating secret fields.

# Required external API keys
DISCORD_API_KEY=
STEAM_API_KEY=

# Optional string used for encoding JWT strings. Will be randomly generated if not provided
#JWT_SECRET=

# Optional string used for internal request authentication. Will be randomly generated if not provided
#INTERNAL_SECRET=

# The website address used to access this epochtal instance. Will be the same as the URL for the website.
# Excluding "https://". For example: "epochtal.p2r3.com"
WEB_URL=

# Discord channels
DISCORD_CHANNEL_ANNOUNCE=
DISCORD_CHANNEL_REPORT=
DISCORD_CHANNEL_UPDATE=

# Optionally specify the SteamID's of administrators ahead of time, to avoid needing to add administrators manually later
# Comma-separated list of SteamID's. Example: "123456789, 987654321"
#ADMINISTRATOR_LIST=

# Map curation v1 weights
CURATION_PREVIEWS=0
CURATION_PREVIEWS_EXTRA=0
CURATION_PREVIEWS_VIDEO=0
CURATION_TAGS_COUNT=0
CURATION_TAGS_VISUALS=0
CURATION_HAMMER=0
CURATION_FILENAME=0
CURATION_DESC_NEWLINE=0
CURATION_DESC_FORMATTING=0
CURATION_REVISION=0
CURATION_TEXT_TURRETS=0
CURATION_TEXT_BEEMOD=0
CURATION_TEXT_RECREATION=0
CURATION_TITLE_LENGTH=0
CURATION_TITLE_CASE=0
CURATION_PLAYTIME_GAME=0
CURATION_PLAYTIME_EDITOR=0
CURATION_AUTHOR_WORKSHOP=0

# Map curation v2 weights
CURATION_QUALITY_DEFAULT=0
CURATION_QUALITY_PUNISH=0
CURATION_SCORE_EXPONENT=0
CURATION_GROUPING_DEPTH=0

# Optionally specify a different data storage directory than the default /data
#DATA_DIR=

# Optionally provide your own OpenSSL private keys
# You can generate them using the following command:
# openssl req -x509 -sha256 -nodes -newkey rsa:2048 -days 365 -keyout privkey.pem -out fullchain.pem -outform PEM
# Either provide a directory with both files, or specify individual key locations. Make sure these keys
# are accessible within the docker container.
#SSL_KEY_DIR=
#SSL_PRIVKEY_PATH=
#SSL_FULLCHAIN_PATH=

Do you have any feedback to this? Are there any other values we can configure?

soni801 commented 3 months ago

@PancakeTAS would you look through all i typed here (yes i know wall of text) and see if anything seems bad? I want some views from different people before starting to build a dockerfile tomorrow probably.

@p2r3 you're also very allowed to see if the way i try to approach things here fit the spirit of the project.

p2r3 commented 3 months ago

I'm not super happy with how even curation weights are environment variables - they feel much less like a "secret" in the sense that keys are and much more like a configuration option for the system. Just my take.

Also, TLS isn't assumed to be on in Epochtal. I'm pretty sure the first run config even disables it by default, so I don't see the need to set up a certificate for the user as long as we just keep it HTTP by default.

Looks good otherwise.

PancakeTAS commented 3 months ago

I don't quite understand why we should push the image to the GitHub registry. What are you even gonna push? The .js files? No of course not, how would you do local development. The config files? Probably also not, because the workshop must always stay up to date. The bin/ folder? Sure, but this takes 2 seconds to download and compile anyways, we can let the end user do that or put it into a premain.

My idea of docker container was just ensuring epochtal runs in a separate environment, so I would simply mount the project root inside the container and have data/secrets/bin managed somewhere else as to not clutter the actual repo. I wouldn't even build a custom image and just roll with some default ubuntu one, but do feel free to comment on this as I've used docker once (1 time) before.

About the weights as env variables - Totally agree with p2r3 here, weights should be in a json file, not in environment variables (heck I'd even go as far as to move weights.json into the data/ dir, as in the future, epochtal itself may or may not tamper with these). They simply do not belong there.

TLS? An option to set it up would be nice, but it's not required.

p2r3 commented 3 months ago

I'm guessing there's a divide in intent here, then. The question becomes whether we want containers to be used for deploying "stable" releases, or do we want it to be the recommended development environment, too?

Personally I don't care too much if it has to be a compromise. Does it have to be, though? I don't at all know how docker works, but is it not possible to git pull from the container if you want to be up to date with upstream for development?

soni801 commented 3 months ago

Oh boy... where do I begin. This is gonna be a 🧱

It is true that the curation weights aren't really "secrets". However, I still think they should be provided through the environment. I'll come back to why later.

I agree that we don't need to setup an SSL certificate automatically. However, I think it's good to let the user enable SSL through env variables, for example with a ENABLE_SSL variable. In this case the user will also need to provide an SSL cert as described in my dotenv file example.

It seems that both of you have too little experience with docker to understand where it really shines. I don't mean this in a rude way at all, but I think it makes parts of this idea kinda go over your head. I'll try to explain it in a short and consise way:

While it is true as you point out that docker runs things in a separate environment, it can do so much more than just that if setup correctly. It is possible to do what you're saying: pull a default image, install everything and run it there by itself. However, this is not the intended nor recommended use of docker. Where docker really shines is when you build your own image for your application, that does all this for you. (after all, what are programmers if not lazy)

To answer Pancake's question: this is what we would push to the GH container registry: a custom docker image. We'd provide the build instructions as code (through a Dockerfile). This is a declarative and really easy way to make sure the image has everything it needs, think of it as an installer shell script. We'd run an action to build this automatically, for example on every release, and push it to some registry (this can also be for example Docker Hub, but GHCR is significantly easier).

When we do this, all the user would have to do is to pull the image:

docker pull ghcr.io/p2r3/epochtal:latest

And then run the image. As easy as that. On first start it already has everything installed and setup, including bspsrc and all those binaries.

However, you might have spotted that this could cause issues with the way epochtal is currently configured: the file system. Docker containers do by their very nature have a different filesystem than your local one, which is a good thing except for the fact that now we can't configure anything. As you've pointed out, you could mount the container's filesystem onto the local filesystem and edit the files from there. However, what if you don't have a practical way to access the filesystem?

Many people, including me, use purpose-made tools for hosting containers. An example of this is Portainer (this is the one I use). This software provides its management tool as a web interface, where you can configure many properties but don't have full filesystem access. This issue can also be even more problematic if you're running your container in orchestrated environments, such as Kubernetes. In this case, your orchestration engine will choose which cluster node to run the container on every time you start it, and may move it around often. This will for obvious reasons make filesystem configuration even less viable.

So, what's the optimal solution? Well, I don't know the optimal solution. But I do know a good solution: environment variables! This way, you can specify configuration values directly in the same command that runs the container! (with the docker command, you can use the -e flag to pass an environment variable. Other management platforms have even easier ways to do this.) This is considered best practice when it comes to docker, and you'll see this behaviour in practically all docker images. Let's mention a few examples:

Postgres recommends specifying the password through environment variables.
Infisical tells you to configure it through environment variables.
Firefly III has a HUGE .env file for configuration. Just look at their example.

As you can see, it's both normal and even recommended to provide the entire configuration through the envionment. This means all configurable aspects of the application, not just secrets. I hope this explains better why I'd want everything declared like this.

Back to epochtal, I do think that most configuration options (for example curation weights) should be configured automatically with a default value to avoid the user needing to mess with it, but I do think it should be easy to change the configuration of if the user wants to do that. This way, the users can choose if they want a minimal configuration or a more extensive one. My earlier comment is an example of a very extensive configuration. A minimal configuration would be as simple as this:

DISCORD_API_KEY=
STEAM_API_KEY=
WEB_URL=
DISCORD_CHANNEL_ANNOUNCE=
DISCORD_CHANNEL_REPORT=
DISCORD_CHANNEL_UPDATE=

The key element here is that configuration through environment variables shouldn't be necessary, but should be possible. I hope you can see how this is a better user experience than needing to mess with the files.

To answer the final question: this approach is not to use docker for development. It is to make deployment easier and more reliable (the same container will always behave the same way on all hosts, eliminating issues caused by the host). As I showed earlier, initial deployment is really easy. Updating the entire deployment to a new version is also ridiculously easy. You just stop the container, pull the image again, and start the container. It's that simple.

Finally, I'd just like to demonstrate that this doesn't limit how you can deploy epochtal, it simply adds several more ways you can choose to manage it. I'll give an example of the ones i can think of off the top of my head:

1. Hosting "manually", the same way as you do now

This is just cloning the repo, installing the dependencies, and running the executable. Quite a manual process.

2. Building the docker image yourself and deploying it

This is an approach I didn't mention earlier, frankly because I personally find it to be the worst of both worlds. This consists of cloning the repo, running docker build to run the container build instructions locally and get a local image, then run this container as described earlier.

3. Running through docker with the provided docker image

This is the example I gave when explaining docker. The user can pull the image we've uploaded to a registry, and just run this.

4. Running with docker compose

Docker compose is a different approach to managing docker containers, that many people prefer over "normal" docker. Normally, you specify everything the container needs as arguments in the docker run command (such as env variables, volumes, ports, etc.). This can get difficult to keep track of with a cluttered shell history. This is where docker compose comes in: a declarative approach to the docker run command. To use docker compose, you provide a docker compose configuration, usually as a docker-compose.yml file:

servies:
  epochtal:
    image: ghcr.io/p2r3/epochtal:latest
    ports:
      - 8080:8080
    environment:
      DISCORD_API_KEY=
      STEAM_API_KEY=
      WEB_URL=
      DISCORD_CHANNEL_ANNOUNCE=
      DISCORD_CHANNEL_REPORT=
      DISCORD_CHANNEL_UPDATE=

Then you can run docker compose up to start it. As long as you provide the latest version tag to the image, docker compose also automatically handles updates. Applying updates will be as simple as restarting the container.

5. Running in a dedicated management platform

There exists tons of docker management platforms, for example Portainer which i mentioned earlier. Many of these provide a web interface where you can easily enter the image name, and provide extensive configuration to your docker engine if you want. You can also easily add environment variables or import them from a .env file. Some even let you just paste your docker compose file! There are many different tools for this, but this is the general idea.

6. Managed/orchestrated engines

Kubernetes and docker swarm are examples of orchestrated engines. In layman's terms, this means that the engine runs across different nodes, and automates resource management across the nodes, dedicating containers to specific nodes based on many parameters such as computing power.

7. Cloud

Yes, I know "the cloud" is an overused term at this point. However, a significant amount of people may choose this approach for its ease of use. Most cloud providers provide easy/native ways to host containers. Again, this would mean just providing the image name and some environment variables, and there you go your docker container is now in the cloud. For people that don't have the option of hosting on personal servers, this is often a significantly cheaper choice than getting a cloud VPS.

Brick wall of text over.

TL;DR: Docker good.

soni801 commented 3 months ago

Forgot to mention this and cba editing my previous comment:

If you want, I'd be more than willing to hop in a call and demonstrate how I use docker. I firmly believe in demonstrations of software for mutual learning.

p2r3 commented 2 months ago

Sounds cool 👍

I don't know when I'd be able to hop on a call to see this in action but I do agree to the idea of that.

p2r3 / epochtal

Containerization support #43

Idea

What should be provided?

Functionality

Necessary code changes before building a docker image

Other considerations

Example

1. Hosting "manually", the same way as you do now

2. Building the docker image yourself and deploying it

3. Running through docker with the provided docker image

4. Running with docker compose

5. Running in a dedicated management platform

6. Managed/orchestrated engines

7. Cloud