sciapp / sampledb

Sample and Measurement Metadata Database
https://scientific-it-systems.iffgit.fz-juelich.de/SampleDB/
MIT License
21 stars 11 forks source link

use the official python image as base image + other changes #31

Closed NicolasCARPi closed 2 years ago

NicolasCARPi commented 2 years ago

Hello Florian,

Here is my take at improving slightly the current docker image.

Using the official python image based on debian will make upgrading python version easier. It also results in a smaller image: Before: 1.11 Gb, now: 983 Mb

And more importantly it avoids having to install python and pip!

Other changes listed below:

FlorianRhiem commented 2 years ago

Hey Nico, thank you for the pull request! Plenty of good points, but I've got a few issues with it as well:

NicolasCARPi commented 2 years ago

GCC is needed to build some dependencies on platforms where there are no wheels for those dependencies, e.g. psutil on ARM-based macOS systems.

I see. Then I suggest using a github action to build the image on the different platforms, this will help find issues like that. I can try and add this to the PR as I have this already in elabftw/elabimg and could easily adapt it. Also, just a tought, we could use a multi stage build where the python packages + gcc will be built in an image, and then copied to the final prod image which should ideally not contain any build tools, but that's not really important for now.

I'm not sure I understand your second point.

I'll bring back the root-chown-then-su.

As for the virtual env, from my point of view, we don't need a venv because we control the full system so I don't think it makes much sense to have a venv in a container. It also makes things simpler I believe. I'll add changes to the doc then.

FlorianRhiem commented 2 years ago

GCC is needed to build some dependencies on platforms where there are no wheels for those dependencies, e.g. psutil on ARM-based macOS systems.

I see. Then I suggest using a github action to build the image on the different platforms, this will help find issues like that. I can try and add this to the PR as I have this already in elabftw/elabimg and could easily adapt it. Also, just a tought, we could use a multi stage build where the python packages + gcc will be built in an image, and then copied to the final prod image which should ideally not contain any build tools, but that's not really important for now.

We're currently using GitLab CI on an amd64 Linux system to build the image that is pushed to DockerHub. We could probably extend that to aarch64 macOS, though I don't think the effort would be worth it, as it's only necessary for development. That is, if the Dockerfile doesn't work on macOS that's fine for SampleDB, it would be an annoyance to the reviewers and developers working on macOS :)

So far, I haven't encountered any issues with the image size, but I suppose uninstalling gcc and running an autoremove would also help somewhat without going straight to a more complex build.

I'm not sure I understand your second point.

Nevermind, I think I read that wrong and got confused similar to you with yesterday's issue. Your change looks right to me, and now I'm wondering why the README and docs shouldn't also mount the directory as /var/lib/postgresql/data instead of /var/lib/postgresql/data/pgdata. Perhaps with a different name for the pgdata directory (perhaps just data?) to ensure that no existing pgdata directory is mounted in the wrong path by someone doing copy/paste of the new example. While the current code seems to work, the postgres image docs seem to indicate that it'd be better to mount it as /var/lib/postgresql/data/.

I'll bring back the root-chown-then-su.

As for the virtual env, from my point of view, we don't need a venv because we control the full system so I don't think it makes much sense to have a venv in a container. It also makes things simpler I believe. I'll add changes to the doc then.

Thanks 👍

NicolasCARPi commented 2 years ago

Okay, I pushed a commit. Let me know what you think. The idea for the sdb wrapper came because I was struggling with the su -c argument dance necessary for arguments to be parsed correctly when doing a docker exec something.

In fact, it would be simpler if the entrypoint was ran by sampledb so we wouldn't need to su sampledb, but this could be for another issue/pr.

We're currently using GitLab CI on an amd64

I'd suggest running it where the code lives, here on github, so PR like this could see the failing ARM build and understand the need for gcc immediatly ;) (also, it's free so why not use it!). You can see how it could look here: https://github.com/elabftw/elabimg/blob/hypernext/.github/workflows/push_next_hypernext_image.yaml. This workflow is building two branches on schedule/push and for two arch and then pushing to docker hub (there is also a vuln scan in there).