ucphhpc / docker-migrid

Containerized MiG
GNU General Public License v2.0
3 stars 7 forks source link

made vgrid_files_readonly ro volume #43

Closed Bjarke42 closed 4 months ago

Bjarke42 commented 5 months ago

The way the vgrid_files_readonly was implemented does not honor the way we use docker-compose.override in ansibl-docker-migrid, so now i have changed it so i can override the mountpoint correctly.

benibr commented 5 months ago

The new volume should also be used in the other environments development and development_gpd

benibr commented 5 months ago

looks good to me!

The read_only: true could even be removed I guess

jonasbardino commented 4 months ago

Thanks for the contribution. I wonder if it would make sense to integrate a generic overrides method here in docker-migrid and to all use that ... @Bjarke42 ? I'm asking because we currently use slightly modified docker-compose-X.yml files linked from the docker-migrid-ucphhpc repo for our own minor adjustments for our sites e.g. to allow podman use, and already had vgrid volume adjustments overlapping with this PR there.

Bjarke42 commented 4 months ago

Thanks for the contribution. I wonder if it would make sense to integrate a generic overrides method here in docker-migrid and to all use that ... @Bjarke42 ? I'm asking because we currently use slightly modified docker-compose-X.yml files linked from the docker-migrid-ucphhpc repo for our own minor adjustments for our sites e.g. to allow podman use, and already had vgrid volume adjustments overlapping with this PR there.

this is the docker-compose.override vi currently use: docker-compose.override.au.yml.j2 Its a nice way of not having too many instances of the same thing, but just as you are talking about having one docker-compose. But i do not know your requirements, so if you only change stuff in the already existing parameters, then you could benefint from it alot

benibr commented 4 months ago

For the record, this PR broke the Test CI as you can see here: https://github.com/ucphhpc/docker-migrid/actions/runs/8345752910/job/22841385858?pr=45#step:6:35

I cannot see though why it was not run for this PR. I open a Issue for that.

jonasbardino commented 4 months ago

I can confirm that using the docker-compose_production.yml from the merged revision 9e5d09dfd67749f6470386f615341bad647901be breaks docker-compose up on my dev erda instance with the same error as the CI builder reported in the link @benibr posted above:

Creating migrid ... error

ERROR: for migrid  Cannot create container for service migrid: failed to mount local volume: mount ./state/vgrid_files_writable:/var/lib/docker/volumes/docker-migrid-erda_vgrid_files_readonly/_data, flags: 0x1000, data:  ro: no such file or directory
...

It runs fine using the previous reversion b79a88238ab52d1212487e1a86a4bb134968cf7f of that same compose file.

jonasbardino commented 4 months ago

The initial crash could be removed by removing the space in the volume mount options and switching to the new docker compose as the default compose command rather than the old docker-compose. Even then we kept hitting random mount errors e.g. in the Github Actions CI job for such nested volume remounts. This looks like a potential race in docker itself as @benibr reported upstream in https://github.com/docker/compose/issues/11663 Still, AFAICT there is no point at all in having this vgrid_files_readonly bind mount as a full-blown docker top-level volume as we just need a live read-only copy of the vgrid_files_writable dir. The same applies for the mig_system_run shared cache space referenced in issue #40. I understood from PR 43 that it was changed into top-level volumes for easing docker-compose overrides at AU, which sounds valid enough. So I've reworked them into plain bind mounts but tried to provide similar flexibility with env variables to allow overriding the actual source path and using the long format to hopefully ease any remaining templating needs. The result is now available in the docker-compose_production_bind.yml file and I'd like your feedback @Bjarke42 and Anders. Will that or can that be made to work with your site setup? It finally brings consistency back in the build everywhere else so I'd really like to have it replace docker-compose_production.yml.

benibr commented 3 months ago

@jonasbardino can you please add a PR for this so that we can see the diff and have a discussion there?

Please do not commit testing configs to the master branch. We have so many unused config files in Migrid and it just makes things confusing. Also it's very complicated to make a test deployment when you have to change a file somewhere in the middle of it. Use git branches plz :-)

Bjarke42 commented 3 months ago

@jonasbardino yes, please continue this in a new PR. Its alot easier for us to see the diff and then relate to what you changed.

jonasbardino commented 3 months ago

You're probably right that it would have been better to either make this as a branch in the first place or to initiate a full revert of this PR with a follow-up retry if needed. Either would introduce breakage or extra work somewhere. As you know it developed in several steps during that bug hunt, because Github Actions are tricky to simulate/debug and I was hoping we could quickly resolve the matter and just replace the existing production conf. In hindsight that was maybe too optimistic, since we all have other urgent matters to tend to.

Anyway, I've made a PR #55 to do what I intend to do. Namely, replace the current docker-compose_production.yml with the proposed new docker-compose_production_bind.yml. Please have a look and perhaps review to confirm or raise any remaining issues with it. I hope it makes it easier, although I'm not really sure it is less pain to test from a branch instead... I'll let you decide on that :-)