mfhepp / py4docker

Template for running Python 3.x shell scripts and notebooks in a Docker container for isolation, security, and portability
MIT License
2 stars 0 forks source link

Separate the mechanisms for (1) scripts, (2) dev environment, and (3) notebooks #15

Open mfhepp opened 10 months ago

mfhepp commented 10 months ago

The project can be used for multiple purposes, like

The usages differ in many aspects, namely

Currently, the images created are partially overlapping, which may cause problems in the long run.

Hence, it seems better to separate the three usage scenarios:

  1. Script deployment takes a script`s code and environment file for building and running an isolated Docker container with minimal privileges on the host. This can also be used for building isolated CLI versions of popular Python applications or packages, like copier or Nikola.
  2. Development environment. This would bundle everything needed for typical Python development workflows, including code formatting, linters, etc. The editor would run on the host, the working directory will be mapped to the container. As one is likely to develop multiple projects, each projects should have its one, pinable environment file (basically a version with the dev dependencies and the runtime dependencies). The dev dependencies could be the same for the entire user, the runtime ones will of course differ. Each project will typically have its own Docker image and its run script or alias and be run from its down directory. For some very simple projects, it may be handy to use a standard image with popular dependencies so that experiments and quick tests do not require a 1 GB image.
  3. Notebooks. There are actually two use-cases:
    • One or multiple standard notebook environments (and respective Kernels) to be run from anywhere on the machine for quick experiments and demos (like nbh <envname). The multiple environment can either be built inside the same image or, likely better, be independent ones.
    • A project-specific notebook environment with its own environment specification, e.g. for specific tasks in research projects. In here, the environment file and the startup script will be in that project folder.

One critical issue is that the identification of the proper image is determined by the image tag on that machine, so we must take care that we do not accidentally start the wrong image.

For script development, we can either use the fully-fledged dev environment or keep the current feature of mounting the src directory to inside the container.

So basically we would have the following commands:

# Build the script / project in the current folder using the environment file found therein with no development dependencies
# TODO: Pin versions or build from a pinned version
build

# Build a dev environment from the standard dev packages and the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build dev

# Build the standard notebook image from the standard notebook packages 
# plus the dependencies found in the current directory
# If there are no additional dependencies, build just the standard image (or use that ?)
build notebook

Now, one key issue is to determine the tag of the image at build and run-time.

Several ideas:

  1. Take the basename of the $PWD. But how to spot collisions? (like src in multiple projects)
  2. Get the basename from a file or script insite the pwd (text, yaml, simply the filename like IMAGENAME.py4docker)
  3. Use a local script / alias per each project if the global defaults are to be superseded, So both build.sh and run.sh have to check this, otherwise, they might start completely different images depending on from where they are being invoked.
mfhepp commented 7 months ago

It's likely that I will split the project into

The overlap between both is relatively small, and this will make the project much less complex.