jstebenne commented 8 months ago

As I was writing the Kubernetes resource definitions for my homelab deployment, I was a bit surprised by the number of volume mounts I needed to create. I think some directories should be moved and merged to make installation easier. This has an impact on both Docker and Kubernetes deployments.

Current state

Feel free to correct me if the assumptions below are wrong, but here's what I think are the current minimum required volume mounts:

Config: /App/config
Data: /App/data
Documents: /App/wwwroot/documents
Images: /App/wwwroot/images
Keys: /root/.aspnet/DataProtection-Keys

My proposition

Move config and data into a new persistence (?) directory
- /App/persistence/config
- /App/persistence/data
- This will group everything that is "backend" into a single volume mount
Merge or move documents and images into a uploads directory
- I might be missing the reason why those two directories are separate, but this forces me to manage two storage allocations for the same kind of files
- Ideally, I would place the uploads directory in data, but just by looking at the current path, it feels like there is a very good reason why it is there
Move DataProtection-Keys into config
- I've never used aspnet, but the current path makes me believe that it would be a bit of a pain to move it
- This looks like sensitive information, so it is probably better to keep this separate from the rest and mount it as a read-only secret. I'll keep this for a different discussion at a later time

The impact of these changes could be mitigated easily since an administrator only need to adjust the paths inside the container, leaving the paths on the host machine unchanged. The storage paths could also be configured through environment variables with the current paths as default.

Why?

Having fewer volumes to manage makes it easier for both Docker and Kubernetes users. However, since Kubernetes is my current use case, I will only focus on that.

When creating a container with volumes in Kubernetes, there are a few extra steps because of how Kubernetes works: the volume data is sometimes not on the same machine as the container. So, to mount a volume, I need to do this:

In the container spec, add an entry to volumeMounts that requires a name and the path of a directory (cannot be a file) inside the container
Define the volumes. This requires setting a name and what kind of volume claim to use. In my case, a PersistentVolumeClaim and specify its name. This will link a volume mount to a volume claim
Create a new definition for a PersistentVolumeClaim and specify the space to reserve for that claim

This gives me something like this (I committed the rest of the configuration not relating to volumes)

Deployment.yml

apiVersion: apps/v1
kind: Deployment
[...]
spec:
  [...]
  template:
    metadata:
      labels:
        name: lubelog
    spec:
      containers:
        - name: lubelog
          image: hargata/lubelogger:v1.2.6
          [...]
          volumeMounts:
            - name: config
              mountPath: /App/config
            - name: data
              mountPath: /App/data
            - name: documents
              mountPath: /App/wwwroot/documents
            - name: images
              mountPath: /App/wwwroot/images
            - name: keys
              mountPath: /root/.aspnet/DataProtection-Keys
      volumes:
        - name: config
          persistentVolumeClaim:
            claimName: lubelog-config
        - name: data
          persistentVolumeClaim:
            claimName: lubelog-data
        - name: documents
          persistentVolumeClaim:
            claimName: lubelog-documents
        - name: images
          persistentVolumeClaim:
            claimName: lubelog-images
        - name: keys
          persistentVolumeClaim:
            claimName: lubelog-keys

pvc.yml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lubelog-config
  [...]
spec:
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Mi
  accessModes:
    - ReadWriteOnce

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lubelog-data
  namespace: lubelog
spec:
  [... similar as previous ...]

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lubelog-documents
  namespace: lubelog
spec:
  [... similar as previous ...]

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lubelog-images
  namespace: lubelog
spec:
  [... similar as previous ...]

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lubelog-keys
  namespace: lubelog
spec:
  [... similar as previous ...]

Each PersistentVolumeClaim must be managed independently (each has its backup and replication strategy) and will not use space outside of the reserved amount. Luckily, these volumes are currently very small since I am the only user of this instance, and I only have three months of data, but having to track each of those can be a bit of a pain.

TL;DR

Reducing the number of volumes required to have persistent data would reduce the deployment and maintenance complexity. In a perfect world, only one volume mount would be required, but I don't believe that is possible or even a good idea.

Feel free to send me any questions you might have. I'm one of the weirdos that prefer to use Kubernetes for long-lived applications, and I know it is way overkill compared to typical deployments. I omitted some information relating to Kubernetes (limitations, processes, etc.) for brevity.

hargata commented 8 months ago

Duplicate, please see #260 for an example of a simplified docker-compose.

jstebenne commented 8 months ago

This sounds like a bad idea since it would mount application code and libraries into a volume. Volumes should only be used for data that needs to be persisted between container executions.

The first issue that will arise from doing this is that software updates will not work since files from the volume could overwrite files from the Docker image.

jstebenne commented 8 months ago

This can also introduce a new attack vector where a bad actor could edit application libraries to deface the deployed website (one of the best case scenario) or inject something like a bitcoin mining script.

hargata commented 8 months ago

@jstebenne that does kinda make sense, primarily I just want to avoid breaking changes in general especially for users that already have a lot of data, so anything that doesn't have to do with the logic portion of the app is just not prioritized in general.

jstebenne commented 8 months ago

I understand that. I am not asking to make this a priority, but I believe this is a needed improvement to simplify data management. Mounting /App is fine in a development environment, but it is a very bad idea for a production deployment.

I haven't looked at the code base closely enough to know if this is a good suggestion, but having a migration script that checks for required migrations on startup would make it seamless for users. This script would definitely be useful in the future, but for the volumes/directories I mention, it could do one of those things:

Set a configuration indicating where these directories are located
Move the directories, ideally to a directory that is already a volume
Add the file path to the DB entry (pretty sure that would only work for uploads)

hargata / lubelog

Reduce the number of required volume mounts #407

Current state

My proposition

Why?

TL;DR