Proposal: Transition from bind mounts to named volumes

BuongiornoTexas commented 7 months ago

Problem PWD currently uses bind mounts for persistent storage on the host.

Docker recommends using volumes for persistent storage for a range of reasons (https://docs.docker.com/storage/volumes/). The main benefits for PWD would be:

Most likely eliminate issues associated with permissions mismatches between host and container (affects synology, windows, docker rootless and probably others). This is probably the strongest reason for doing it.
Performance is better for Mac and Windows platforms - this will likely improve grafana and influxdb responsiveness.
It's cleaner from an end user perspective (at the cost of some development pain).
I suspect it will make cross platform migration easier.

I've done some brief experimentation and found that it is fairly straightforward to migrate the grafana binds to named volumes (took a couple of hours to put the pieces together, and is fairly small set of commands), and I'm pretty sure that it would be similar for the other bind mounts.

Enhancement At least, transition read-write bind mounts to named volumes, as these appear to be the key pain points for permission issues. But if we do adopt this, it's probably worth transitioning all mounts for simplicity of future management.

Additional context

I'm happy to work up a test script to demonstrate the transition for grafana, and if this looks good, move on to PR(s) for the change. In addition to the transition, PRs would also need to address how to ensure conf files are updated within containers (docker cp) and detail a new backup procedure, as the volume data is managed in the docker data space.

jasonacox commented 7 months ago

Hi @BuongiornoTexas - You are always tweaking!!! Thanks for always challenging the status quo with new insight and sharing your research. Would you be willing to share your powerwall.yml setup or at least the named volume sections?

I am familiar with named volumes and have used them. I have also been bitten by them with things like docker system prune that actually deletes named volumes of stopped containers (yikes!). They are more complex and less approachable by most users (myself included) where bind mounts are a simple two-way directory mapping. Bind mounts can provide more visibility into the data and easier backup and access. Bind mounts also allow you to edit files manually from outside the container. Except for Windows and MacOS, bind has higher I/O performance, though one could argue it isn't as critical. The occasional permission issues we have seen in setup seem low now with our recent updates (some of which, thanks to you). And most important, I'm not enough of an expert on named volumes to know all the potential pitfalls we may face on the various platforms and different installations of the community.

All that to say, I'm not comfortable making this change globally. In this case, the devil we know is manageable (self service support, easier issues, support discussions, etc). However, I'm happy to make this an option for the power users like yourself. We could also include something like powerwall-named.yml for custom installs, or provide the how-to in our discussion forums.

BuongiornoTexas commented 7 months ago

All that to say, I'm not comfortable making this change globally ... make this an option for the power users like yourself.

TBH, supporting both power users and mainline users feels like a worst of both worlds option. I'd rather continue working with the mainstream package and managing issues myself rather than inflicting the pain of maintaining parallel trees on everyone else.

Would you be willing to share your powerwall.yml setup or at least the named volume sections?

No problem on this - If you are interested in at least exploring the option a bit further, I'll go one better with a couple of days experimentation and providing a simple script for reversible migrations of both the grafana and telegraf containers (as these have the simplest setups).

That said, if you are set on staying with bind, I'd rather just close this proposal out as a dead end.

I have also been bitten by them with things like docker system prune that actually deletes named volumes of stopped containers (yikes!).

I've had problems with pruning containers, but not volumes so far. (no data loss, but irritating), and it would be a thing to watch out for. However: a) using prune is very much a power user function - anyone who does it should be aware of possible consequences and b) the current version of docker does not remove volumes automatically, and we could add a keep filter to the docker config to give belt and braces protection (https://docs.docker.com/config/pruning/#prune-volumes). I'd still want to do a bit of testing to be sure, but on first look, I'd say protections are in place.

They are more complex and less approachable by most users ...

All of these are reasonable criticisms, but I also don't think these issues are as significant as you are suggesting. And I think the long term benefits of (mostly) separating container and host data space outweigh the problems (I guess I'm agreeing with dockers reasoning on this one). Touching on your points:

I continue to have permission issues - I stay quiet about them, because I'm an edge case and I haven't found a good long term solution until coming across named volumes (which look as though they rinse the problem almost completely).
On volumes themselves:
- influxdb and grafana require read-write volumes that are generally only accessed by web interfaces and curl calls, so two way data transparency is not an issue in operation. Pulling data as tar files can be done as needed, but should be pretty rare (I've never modified either grafana or influx data via bind).
- Our configuration files are typically read only, so we can (and should) continue to keep the masters in host space and update copies in the container volumes as required with a simple script using docker cp. For most users this would never be seen, as it would sit in/be called by update/verify.sh, and power users/tinkers would quickly get used to calling it (possibly even make it part of compose-dash?). More complex than directory mapping, but not very much so. In effect, most files that need editing remain outside the volumes and easily accessible in host space.
Docker provides documentation for volume backups via tar (https://docs.docker.com/storage/volumes/#back-up-restore-or-migrate-data-volumes), and they are pretty explicit that this is an easy way to support migrations. We could easily set up a script that backs up the key data volumes and config files into either a mega tar archive or a set of tar archives.

jasonacox commented 7 months ago

Thanks @BuongiornoTexas - I always appreciate your thoughtful responses. I'm happy leaving this open for others to review and comment as well.

BuongiornoTexas commented 7 months ago

OK - I'll pull the demo scripts together and we can see if you and others like what you see.

BuongiornoTexas commented 7 months ago

WARNING: the scripts below include use of docker container prune - please don't run these if you don't know what this does, as the consequences could be very not good. It's only here to illustrate a specific issue. Ideally, I'd suggest running this on a test setup only (that said, I've run it repeatedly on my live setup and haven't had any issues).

So real life intervened and delayed things a bit, but here are the demo scripts/files. To run these, drop all three files in a directory parallel/at the same level as Powerwall-Dashboard (e.g. ... /Powerwall-Dashboard/../test). The scripts should be run in the test folder by bash <scriptname>. @jasonacox and anyone else interested in experimenting - I strongly recommend reviewing the scripts before running them - especially if you are using docker for anything other than PWD (in which case, the docker container prune command becomes quite high risk!).

powerwall.named.yml is compose file for running grafana and telegraf using named volumes based on PWD 4.2.0.
test-named.sh is a script for setting up and running the containers using the named volumes. It does the following things and is chatty about it:
- stops telegraf and grafana
- backs up the bind version of powerwall.yml.
- creates the -named versions of the containers that uses the named volumes, and the named volumes themselves.
- populates the configuration files for the containers using two different approaches (requires su elevation for the tarfile method).
- moves the grafana folder to break any possible bind link.
- brings the updated stack up.
revert-named.sh is a script for reverting the above changes. It:
- stops the named containers.
- reinstates the bind based powerwall.yml and moves the grafana folder back.
- brings the stack back up.
- cleans up the named containers using docker container prune and a label filter. Given how indiscriminate the prune command is, I would not used it in any live system, but I've used it here to demonstrate that pruning a container does not clobber any volumes it uses.
- lists the volumes to show that they haven't been deleted.
- deletes the volumes (as they are proof of concept at this stage).

test-named.sh

```shell #!/bin/bash # NOTE: Need to update powerwall.named.yml to match current repo powerwall.yml as far as possible before issue. # script assumes we are in a folder parallel to Powerwall-Dashboard WORKING="$(pwd)" PWD="${WORKING}/../Powerwall-Dashboard" # And broadly, we will work in the PWD for convenience. cd "$PWD" # belt and braces - create copy of powerwall.yml on FIRST RUN only. cp -n powerwall.yml "${WORKING}/powerwall.bind.yml" # Do preliminaries. # If we wanted to create volumes manually, we could use commands similar to this. # But I'm letting docker compose do this automatically. # docker volume create pwd-grafana-rw # docker volume create pwd-telegraf-ro # Stop the bind containers. docker stop grafana docker stop telegraf # Grab current uid/gid PWD_OWNER="$(id -u $USER):$(id -g $USER)" # Create a tar archive of the grafana configuration. # Run as su to defang permissions problems. Note that we # rewrite permissions on the fly to match the default grafana # container uid and gid. And then we correct uid/gid of grafana file. # (correcting uid/gid probably not be needed). # This should only be neeed for transitioning from bind and has the # added bonus that both normal and edges cases have the same handling. echo "su password required to create tar archive of grafana config." su -c "tar --owner=472 --group=0 -cvf ${WORKING}/grafana.tar grafana; chown $PWD_OWNER ${WORKING}/grafana.tar" # Move grafana directory as part proof of concept mv grafana grafana.bak # update powerwall.yml cp "${WORKING}/powerwall.named.yml" ./powerwall.yml # create the containers, create the volumes as a byproduct. # Will also prepopulate telegraf volume with sample files (we will overwrite) bash compose-dash.sh create grafana-named bash compose-dash.sh create telegraf-named # Use busybox as a helper to unpack the tar archive, and yes, use a temporary bind for this. # This is one of many ways to achieve this. # (Could do this with docker cp and pipes as well). # tar runs as root and preserves ownership and permissions from the tarfile. docker run --rm --volumes-from grafana-named \ -v "${WORKING}:/backup" \ busybox sh -c "cd /var/lib/grafana && tar xvf /backup/grafana.tar --strip-components=1" echo "List of volume /var/lib/grafana should have owners of 472:root" docker run --rm --volumes-from grafana-named \ busybox sh -c "ls -l /var/lib/grafana" # Use docker cp to update the telegraf config. docker cp telegraf.conf telegraf-named:/etc/telegraf docker cp telegraf.local telegraf-named:/etc/telegraf/telegraf.d/local.conf # Bring the stack up. bash compose-dash.sh up -d ```

revert-named.sh

```shell #!/bin/bash # script assumes we are in a folder parallel to Powerwall-Dashboard WORKING="$(pwd)" PWD="${WORKING}/../Powerwall-Dashboard" # And broadly, we will work in the PWD for convenience. cd "$PWD" # stopped named volume containers docker stop grafana-named docker stop telegraf-named # reinstate grafana folder mv grafana.bak grafana # revert compose file to bind version cp "${WORKING}/powerwall.bind.yml" ./powerwall.yml echo " " echo "Bring the stack back up." echo " " bash compose-dash.sh up -d # Clean up named based containers and volumes echo "Pruning stopped containers using named volumes to show it doesn't delete volumes." echo "(command run once for each named container, hit y to OK each)" echo "(However, given how brutal prune is, probably better to avoid this " echo "altogether and use commands like remove orphans instead)." echo " " echo "WARNING: Before you run this command, probably worth using docker ps to check your stack is up." echo "(The command should only delete the named containers, but just in case.)" echo " " docker container prune --filter "label=com.docker.compose.service=grafana-named" docker container prune --filter "label=com.docker.compose.service=telegraf-named" echo " " echo "Data volumes for telegraf and grafana should still be present." echo "Listing volumes now" echo " " docker volume list echo " " read -r -p "Removing test volumes. Hit y to proceed." response echo " " if [[ "$response" =~ ^([yY][eE][sS]|[yY])$ ]] then docker volume rm pwd-grafana-rw docker volume rm pwd-telegraf-ro fi ```

powerwall.named.yml

```shell volumes: pwd-grafana-rw: # external is only needed for managing volumes that will only # be created externally. Probably not what we want. # external: true name: pwd-grafana-rw pwd-telegraf-ro: # Make it read only for the container in the service definition. name: pwd-telegraf-ro services: influxdb: image: influxdb:1.8 container_name: influxdb hostname: influxdb restart: unless-stopped volumes: - type: bind source: ./influxdb.conf target: /etc/influxdb/influxdb.conf read_only: true - type: bind source: ./influxdb target: /var/lib/influxdb ports: - "${INFLUXDB_PORTS:-8086:8086}" env_file: - influxdb.env pypowerwall: image: jasonacox/pypowerwall:0.8.2t53 container_name: pypowerwall hostname: pypowerwall restart: unless-stopped volumes: - type: bind source: .auth target: /app/.auth user: "${PWD_USER:-1000:1000}" ports: - "${PYPOWERWALL_PORTS:-8675:8675}" environment: - PW_AUTH_PATH=.auth env_file: - pypowerwall.env telegraf-named: image: telegraf:1.28.2 container_name: telegraf-named hostname: telegraf restart: unless-stopped # user is not required for named volume. # as we can rely on the default user! # woohoo no perms problems. # user: "${PWD_USER:-1000:1000}" command: [ "telegraf", "--config", "/etc/telegraf/telegraf.conf", "--config-directory", "/etc/telegraf/telegraf.d" ] volumes: - type: volume source: pwd-telegraf-ro target: /etc/telegraf # we could make this read only. # But is not needed as we keep master configs on host. # read_only: true depends_on: - influxdb - pypowerwall grafana-named: image: grafana/grafana:9.1.2-ubuntu container_name: grafana-named hostname: grafana restart: unless-stopped # user is not required for named volume. # as we can rely on the default user! # woohoo no perms problems. # user: "${PWD_USER:-1000:1000}" # Used named volume for grafana control data. volumes: - type: volume source: pwd-grafana-rw target: /var/lib/grafana ports: - "${GRAFANA_PORTS:-9000:9000}" env_file: - grafana.env depends_on: - influxdb weather411: image: jasonacox/weather411:0.2.3 container_name: weather411 hostname: weather411 restart: unless-stopped user: "${PWD_USER:-1000:1000}" volumes: - type: bind source: ./weather target: /var/lib/weather read_only: true ports: - "${WEATHER411_PORTS:-8676:8676}" environment: - WEATHERCONF=/var/lib/weather/weather411.conf depends_on: - influxdb ```

jasonacox / Powerwall-Dashboard

Proposal: Transition from bind mounts to named volumes #463