tylercollier / openresync

Open Real Estate Sync (openresync) is a node application that syncs (replicates) MLS data from one or more sources via the RESO Web API to MySQL or Solr, with an admin UI.
https://openresync.com
MIT License
35 stars 13 forks source link

Consider publishing this repo on Docker #11

Closed ckeeney closed 1 year ago

ckeeney commented 1 year ago

This repo could fairly easily be published on Docker. I published an image here, but it would be far better if it was handled through automated builds from Github and Docker integration with this repo.

The Dockerfile I used to build the image I published is

# STAGE: INSTALL DEPENDENCIES
FROM node:16-alpine AS deps
# Check https://github.com/nodejs/docker-node/tree/b4117f9333da4138b03a546ec926ef50a31506c3#nodealpine to understand why libc6-compat might be needed.
#RUN apk add --no-cache libc6-compat
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm install

# STAGE: BUILD ARTIFACTS
FROM node:16-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
COPY config/config.example.js ./config/config.js
RUN npm run build

# STAGE: RUN
FROM node:16-alpine AS runner
WORKDIR /app

ENV NODE_ENV production

#RUN addgroup --system --gid 1001 nodejs
#RUN adduser --system --uid 1001 nodejs

COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY . .

#USER nodejs

VOLUME ['/app/logs', '/app/config']
EXPOSE 3000

ENV PORT 3000

CMD ["node", "server/index.js"]

A few improvements could be made, but this is good enough for my current use case. In particular, I didn't want to juggle permissions for the volumes between the host that runs the container and the container itself, so the container runs as root.

This image requires a config file to be mounted at /app/config.js. My typical usage in docker-compose.yml looks like this:

  openresync:
    image: ckeeney/openresync
    restart: always
    ports:
      - 4000:4000
    environment:
      TRESTLE_CLIENT_ID: SECRET
      TRESTLE_CLIENT_SECRET: SECRET
      DB_CONN_STRING_TRESTLE: mysql://root:root@mysql:3306/re-data
      DB_CONN_STRING_STATS: mysql://root:root@mysql:3306/re-data
    volumes:
      - ./volumes/openresync/config:/app/config
      - ./volumes/openresync/logs:/app/logs
    depends_on:
      - mysql

Down the road, it would also be nice to bundle a config file that looks at a bunch of different environment variables so it could be possible to run the image by setting environment variables without the need to write an entire config file. Supporting things like selecting fields makes this fairly complicated so it's probably fine to not focus on this feature until later.

ckeeney commented 1 year ago

I'm happy to add a PR for this for two files, .dockerignore and Dockerfile. Though there would be additional steps required to configure the Docker Hub integration.

tylercollier commented 1 year ago

Hi there. Cool idea. I'm a fan of docker, although I haven't published anything yet. I'm not familiar with the AS keyword yet. Can you explain why you broke it into 3 stages? The permissions issue seems important to solve. I wouldn't want to run things as root.

What would you say is the main advantage of doing it via docker?

Is it to encapsulate all the source files, aka sweep them under the rug? Then you might like the rewrite that is happening, of which one main idea is that it will have a CLI to generate a project. As in, similar to e.g. create-react-app, you'd do something like ores create [dir path], and in there it'd

Or is it to have a separate node environment? That's cool. I've just been using nodenv.

Again, I'm a fan of Docker and I can't really say there's a reason NOT to do this. I'll likely sit on this until the rewrite is complete. But without a strong reason to do it, it'll be lower priority.

Down the road, it would also be nice to bundle a config file that looks at a bunch of different environment variables so it could be possible to run the image by setting environment variables without the need to write an entire config file. Supporting things like selecting fields makes this fairly complicated so it's probably fine to not focus on this feature until later.

I've been extremely glad I made the configuration via JS, as opposed to JSON or env vars. JS can utilize those. But there any many things to configure that require JS functions so that's a requirement.

ckeeney commented 1 year ago

Hi Tyler.

Because I didn't say so initially, fantastic project.

Hi there. Cool idea. I'm a fan of docker, although I haven't published anything yet. I'm not familiar with the AS keyword yet. Can you explain why you broke it into 3 stages? The permissions issue seems important to solve. I wouldn't want to run things as root.

Candidly, I just copied one of my other project's Dockerfile's and adapted it for this project. It was originally adapted from the nextjs with-docker example.

Some repositories require additional packages to build that are not required to run the project. Separating the build steps this way keeps final production images small, although in my simple example above the runner image is just as big as the others.

What would you say is the main advantage of doing it via docker?

I find that running and maintaining things in docker is simpler. To be a user (not a developer) of this project right now, you have to:

  1. ensure you have node installed
  2. write a config file
  3. set environment variables if used in the config file (and they probably should be)
  4. download the source code
  5. install dependencies
  6. build the Vue assets
  7. run server/index.js
  8. repeat steps 4-7 for updates

Publishing this on Docker Hub allows you to publish a version of this software that already has the specific version of node this project is written for, including the source code, the dependencies, and the assets already prebuilt. If someone wants to use ckeeney/openresync, the steps are

  1. ensure you have docker and docker-compose installed
  2. create config and logs directories to mount
  3. write a config file
  4. write a docker-compose.yml file
  5. docker-compose up

Another benefit of Docker is that if a project needs to move to a Kubernetes cluster for scalability or reliability, you can simply use the Docker image in Kubernetes Deployments

Edit:

I realize I did not address your concerns about the permissions and running as root.

You can see in my Dockerfile example 3 lines commented out for adding the nodejs user and group and switching to that user. If we uncomment those three lines, then the process will run as the nodejs user with UID and GID 1001. However, the directories and files (config, logs, config.js) created on the host machine will not be readable or writeable by the nodejs user unless on the host you run chown 1001:1001 ./logs -R and chown 1001:1001 ./config -R.

Unfortunately when 1001:1001 is the owner of your config.js file, you probably can't make changes to it as easily. This is why for my example and while developing config.js I chose to just let the process run as root inside the docker container.

Once again, great project.

tylercollier commented 1 year ago

Because I didn't say so initially, fantastic project. Once again, great project.

Thanks for the kind words! I'm not sure if you saw but we have a Discord server so feel free to join us in there. I'd be interested to know your story and of course want to hear if it's working out for you. It's fun to hear the war stories of others regarding syncing MLS data.

And thanks for the explanation. You make good points. Of course I want this to be as effortless to use as possible. I was planning on using your idea but doing it down the road. However, you mentioned you'd consider doing a PR. In that case, why wait? I think what I'd do is put a section in the README about Docker, and link to a file at docs/docker.md or something, where your PR could explain how people could use the image on Docker. You could publish under your name on Dockerhub for now, and someday if ... checks docker hub website ... nevermind, creating an organization there is expensive, so I might never create an organization there.

We'll need to sort out the permissions issue. Yeah, this is tough right, because you can't make assumptions about the host, including things like user and group IDs, and you can't set the IDs during image build time to what's desired by the person running the image as a container because you don't know the IDs.

If your PR were also willing to set up the build process, that'd be icing on the cake. Apologies if it's asking too much, I don't know. Even if you could get it started, that'd be helpful because I've never done anything like that. Or perhaps it's really something that needs to be handled more on my end by researching Github actions?

tylercollier commented 1 year ago

Thanks for chatting on Discord. Just wanted to mention thanks for this idea. It hit me the other day as I was trying to use the tool bee-queue/arena, and I wished someone had dockerized it. Actually... they had. It's just woefully old and buggy and they aren't paying attention to their issues and pull requests which fix the problems for them. So when I do this, I'd like to set it up right to auto build (as you mentioned). That being said, it still pains me with the limitations in v1 of this project so I'm focusing on the rewrite, and then will feel better about dockerizing that.

ckeeney commented 1 year ago

No worries. I will close this issue for now.

That said, I am running a dockerized version of this application so if someone ever needs help getting that working feel free to point them my way on discord.