Internal system cache should use fast scratch space

jonasbardino commented 6 months ago

Various internals rely on the mig_system_run folder for storing volatile information like caches, session tracking and status markers. On production systems we use a fast scratch space in tmpfs for this purpose to speedup operations on those files. The contents are automatically re-generated on use and do not require persistence across restarts, so in-memory storage fits well. A fast flash-based storage could be another option depending on memory and storage availability.

For e.g. the status markers to work between services the storage needs to be shared, however. Otherwise things like account suspension and expire will not transparently take affect in all containers.

In docker-migrid a similar fast scratch space should be integrated to improve performance. It cannot be completely automated because it requires the host to provide a suitable location and point the containers to use it.

jonasbardino commented 6 months ago

In one of our development systems we've set up a 256MB tmpfs scratch space defined in /etc/fstab as:

tmpfs   /storage-mem/mig_system_run  tmpfs   nosuid,nodev,noatime,noexec,uid=1000,gid=1000,mode=0770,size=256m   0 0

Then in the active docker-compose.yml each container just links that location into place for simple shared use.

volumes:
         [...]
          # NOTE: mig_system_run is a shared volatile cache using host tmpfs
          - /storage-mem/mig_system_run:/home/mig/state/mig_system_run

to significantly speed up operations on the caches.

The 256MB size was chosen a bit arbitrarily as an example. We see actual data sizes in maybe a few tens of megabytes even on production systems, so one can probably pick any size from 64m if memory is very scarce, or leave out the size argument completely to let the system use the default percentage. The uid and gid values match the default setup, and just need to be adjusted to fit if one uses different values in docker-migrid so that all container services can read and write there.

jonasbardino commented 6 months ago

We can add a note about the tmpfs setup and corresponding commented out volume lines in docker-compose_production.yml to ease use, but all deployments will need to act to enable it correctly.

benibr commented 6 months ago

Some thoughts on that:

The mounting of a tmpfs should be done in Ansible (or whatever admins use to deploy the environment of Migrid) Bjarke and I already talked about how that might be possible.
The directory which hold the cache data on the host should be configurable and documented in docker-migrid. Something like MIG_SYSTEM_RUN maybe? So the volume mount might look like:
```
volumes:
     [...]
      # NOTE: mig_system_run is a shared volatile cache using host tmpfs
      - ${MIG_SYSTEM_RUN}:/home/mig/state/mig_system_run
```
The default could also be something like /tmp/mig_system_run to enable the volatile behaviour by default. Some distros even mount /tmp as a tmpfs iirc.
Could you explain a bit more what you mean with the status markers needing to be shared between services. We have the issue that we have multiple physical hosts running different Lustre services. Does the mig_system_run folder needs to be shared between them?

jonasbardino commented 4 months ago

Sorry, I never got back to you on this one - and thanks for your valuable input, btw :+1:

Sure, it would be better to have it exposed as a variable and then handled where most appropriate on each site. I only started looking into basically integrating it hard-coded in our docker-compose files before your comment, but got sidetracked by other more urgent matters and vacation so it's still only half done :-/

For a few things like account expiry and suspension to fully kick-in the mig_system_run/ contents currently need to be shared or synchronized between all migrid containers for one site. Which in practice of course makes it tricky to use a local tmpfs if the individual migrid-X containers are distributed across multiple VMs/hosts. If you have the mig_system_run directory on lustre you would still want to mount the same lustre location into each container or run some frequent synchronization on top. Hope that answers your question.

jonasbardino commented 4 months ago

Implemented as suggested with a new MIG_SYSTEM_RUN variable to define which directory should be bound to the internal state/mig_system_run. It is documented in the variables doc and instructions for setting up and using a tmpfs mount for it are included with the variable in the provided .env files.

ATTENTION integrators should make adjustments accordingly at least on production sites.

ucphhpc / docker-migrid

Internal system cache should use fast scratch space #40