Basic structure for running Python 3.x shell scripts in a Docker container, with several techniques for sandboxing the execution from the host system.
Based on micromamba-docker
and Uwe Korn's tips for smaller image sizes.
Code inside the Docker container runs as a non-root user, thanks to the micromamba-docker
base image.
Outbound and inbound network access is blocked by default, which reduces the risk of exfiltration of local data or code, or loading malware components or instructions (e.g. caused by compromised PyPi packages).
File-access is limited to the current working directory and can be disabled entirely.
output
is created if it does not exist and mounted for write-access.mambauser
inside the container will use the user ID (UID) and group ID (GID) of the user starting the run_script.sh
script, if the UID is >=1000 (i.e. a non-system user on most Linux systems). This will mitigate file permission issues.Small footprint (ca. 300 MB)
Several techniques for limiting access rights (inspired by the OWASP Docker Security Cheatsheet):
tmpfs
so that temporary files can be created.Development mode, in which the local version of the Python code can be run inside the container
Jupyter Notebook / JupyterLab: You can also run Jupyter Notebook and JupyterLab inside the isolated container.
Reproducible: build.sh
writes a YAML specification including versions for all conda
and pip
components, which can be used to reproduce a Python environment.
The code is meant as a skeleton for your own work. Please do not fork this repository if you are creating your own project. A fork is appreciated for pull-requests related to this template.
git clone https://github.com/mfhepp/py4docker.git
.git
; set up your own Git project, if needed../build.sh
It should end like so:
#11 exporting to image
#11 exporting layers
#11 exporting layers 0.8s done
#11 writing image sha256:... done
#11 naming to docker.io/library/test_app done
#11 DONE 0.8s
# Run script
./run_script.sh FooBar
The script should run and report its progress, like so
2023-12-01 23:03:58,436 INFO [main.py:28] Script started.
2023-12-01 23:03:58,436 INFO [main.py:29] Hello, !
2023-12-01 23:03:58,436 INFO [main.py:42] Test for read-access to /usr/app/src
2023-12-01 23:03:58,437 INFO [main.py:44] OK: Read access to /usr/app/src, found 1 entries
2023-12-01 23:03:58,437 INFO [main.py:45] Found 1 items in /usr/app/src
2023-12-01 23:03:58,437 INFO [main.py:47] main.py
2023-12-01 23:03:58,437 INFO [main.py:48] Test for write-access to /usr/app/src
2023-12-01 23:03:58,437 INFO [main.py:54] OK: Write access to /usr/app/src is blocked [[Errno 30] Read-only file system: '/usr/app/src/test.txt']
2023-12-01 23:03:58,437 INFO [main.py:42] Test for read-access to /usr/app/data
...
2023-12-01 23:03:58,440 INFO [main.py:55] Testing outbound Internet access
2023-12-01 23:03:58,442 INFO [main.py:64] OK: Network access is blocked [HTTPSConnectionPool(host='www.apple.com', port=443): Max retries exceeded with url: / (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0xffff8bfec830>: Failed to resolve 'www.apple.com' ([Errno -3] Temporary failure in name resolution)"))]
2023-12-01 23:03:58,442 INFO [main.py:65] Testing if user running the script has root access
2023-12-01 23:03:58,442 INFO [main.py:73] OK: Python script seems to have no root privileges. [[Errno 13] Permission denied: '/root/']
2023-12-01 23:03:58,442 INFO [main.py:74] Done.
Now, you can start working on your own code.
build.sh
and run_script.sh
, change the string test_app
to a name for your application (e.g. my_crawler
), like so
APPLICATION_ID="my_crawler"
env.yaml
run_script.sh
to the name of your project (like my_crawler.sh
).Your Python script will see the following directory structure:
/usr/app/src
/usr/app/data
/usr/app/data/output
/usr/app/src
: This is the source code and startup directory.
In the regular mode, this is the src
folder inside the Docker container, created from the image.
src
in the directory that contains the run_script.sh
script. Symbolic links will be resolved./usr/app/data
: This is the host's current working directory, i.e. from where you start the run_script.sh
script./usr/app/data/output
: This is a writeable directory for results, mapped to the output
folder within the current working directory on the host.Important:
run_script.sh
script. The rationale is that the code can only see the data from the current (working) directory and only write to a dedicated output
subdirectory therein. ~/
, then the script can read all files from all subdirectories.In the development mode, the inner workings are a bit more complicated. Please see the comments in the run_script.sh
file for details.
build.sh
Before you can run your own code, you need to build a Docker image with build.sh
:
Usage: ./build.sh [OPTIONS] [<env_name>.yaml]
Option(s):
-d: development mode (create <username>/test_app:dev)
-f: force fresh build, ignoring cached build stages (will e.g. update Python packages)
-n: Jupyter Notebook mode (create <username>/notebook or <username>/notebook:<env_name>)
Note: The notebook mode is not yet fully functional.
You can pass the name of another YAML environment file as CLI argument (the file extension .yaml
is added automatically.). The name of the YAML file will be added to the Docker image tag, like so:
# Use foo.yaml and create the image
# <username>/test_app:foo
./build.sh foo
# Use foo.yaml in development mode and create the image
# <username>/test_app:foo-dev
./build.sh -d foo
Go to your project directory and execute:
./build.sh -d
This builds a development image, named <username>/test_app:dev
(or whatever you chose for test_app
; the digest :dev
is added automatically).
When done, you can build a production image with
./build.sh
This builds an image for production, named <username>/test_app
(or whatever you chose).
The motivation for two images is that you will keep an image of your last working version available while you are developing (e.g. on feature branches).
Also, in the development image, the local code is mapped to /usr/app/src
and always in sync with your version on the host machine.
Due to Docker caching mechanisms, new versions of Python packages or security updates to the Debian system will only be installed if you tell Docker to ignore the cached previous stages when building the image (or if you change env.yaml
).
This can be done with the -f
(for force) option:
# Development image
./build.sh -d -f
# Production image
./build.sh -f
Note that this may change the installed versions of Python packages. There is currently no mechanism for pinning the installed versions.
You can build a Docker image from the *.yaml.lock
files, which contain the pinned versions of all conda
and pip
dependencies with the option -l
, like so
./build.sh -l
./build.sh -nl dataviz
run_script.sh
This script starts the code in main.py
inside a Docker container.
Usage: ./run_script.sh [OPTIONS] [APP_ARGS]
Options:
-d: (D)evelopment mode (mount local volume, as read-only)
-D: Expert (D)evelopment mode with WRITE ACCESS to src/
-i: (i)nteractive mode (keep terminal open and start with bash)
-n: Allow outbound (N)etwork access to host network
--help: Show help
All other arguments and options will be passed to your main.py
application.
It supports two modes:
In this mode, the local version of your src
folder is mounted within the Docker container. Also, the deevlopment image is being used.
In other words, if you change your code, the new code will be executed via run_script.sh
.
./run_script.sh -d
Warning: Try to avoid using this mode from within the src
directory, as malicious code could change your executable components.
In this mode, your src
folder contains what has been copied to the Docker image at build time and remains unchanged and read-only.
./run_script.sh
In both of the main modes, you can tell run_script.sh
to provide an interactive terminal session to the respective container instead of running the main.py
script.
# Development Mode
./run_script.sh -d -i
# Production Mode
./run_script.sh -i
You can execute any Linux commands in there, e.g.
ls
In order to run your script in the interactive mode, just type
python ./main.py
Note that you can only write to the output
folder, while the rest of the system is read-only:
# This will work
cd /usr/app/data/output
echo This is a test > test.txt
# This won't
cd /usr/app/data
echo This is a test > test.txt
You can grant your script access to the host`s network with
# Development Mode
./run_script.sh -d -n
# Production Mode
./run_script.sh -n
While this is necessary for many types of applications (like Web crawlers), it introduces a much larger risk for malicious code, in particular the transmission of secrets stolen from your machine or other data to a remote server.
Note: It is possible that access to the Internet will not work if you are running the Docker daemon in rootless mode.
You will only see output from the pre-configured logger, not from print()
statements.
For outputs, add statements like
logging.info("That is what I have to say.")
as needed.
If you want to log the output of the container (stdout
and stderr
) to both a file and the console, use
./run_script.sh [OPTIONS] [APP_ARGS] 2>&1 | tee -a logfile.log
If you just want to redirect it to the logfile, use
./run_script.sh [OPTIONS] [APP_ARGS] >> logfile.log 2>&1
run_script.sh
It is recommended that you create a simplified version of the run_script.sh
script for deployment with all of the options hard-wired for security reasons.
If you want to be able to run the script just by a single command, like my_script FooBar
, then add the following lines to your .bash_profile
file, like so:
# ~/foo/bar/py4docker/ is the absolute path to the project in this example
alias my_script="bash ~/foo/bar/py4docker/run_script.sh"
It is strongly recommended to use an absolute path in the alias (otherwise, one random version of multiple copies of run_script.sh
with different functionality might be executed depending on your $PATH
and from where you run the command).
Warning: An alias will allow you to run the script from any folder on your system, and that folder will be available for read-access to the script as /usr/app/data
.
You can build isolated containers with Juypter Notebook and JupyterLab.
Note: This functionality is likely to become a separate project, see Issue 15
notebook.yaml
# This will build <username>/notebook:latest
./build.sh -n
# This will build <username>/notebook:dataviz from dataviz.yaml
./build.sh -n dataviz
# This will build <username>/notebook:openai from openai.yaml
./build.sh -n openai
notebook.yaml
to a new YAML file (e.g. foo.yaml
) and add modules as needed.# This will build <username>/notebook:foo from foo.yaml
./build.sh -n foo
nbh
(for 'notebook here')Add the following lines to your .bash_profile
file, like so:
# ~/foo/bar/py4docker/ is the absolute path to the project in this example
alias nbh="bash ~/foo/bar/py4docker/run_notebook.sh"
Warning:
The notebook containers need write-access and a network connection and are hence not as well isolated as in the Python script modus.
The current working directory will be mapped to /usr/app/src
inside the container.
For a list of available notebook images (=environments), you can use the alias nbh
nbh --list
or
./run_notebook.sh --list
notebook.yaml
# This will start <username>/notebook:latest
nbh
# This will start <username>/notebook:dataviz
nbh dataviz
# This will start <username>/notebook:openai
nbh openai
# This will start <username>/notebook:foo built from foo.yaml
nbh foo
/mnt/data
You can map any other directory from your system as read-only bind volume to /mnt/data
inside the Docker container like so:
# /home/foo/bar will be accessible as /mnt/data inside the container:
./run_notebook.sh --data-dir /home/foo/bar
/mnt/secrets/
You can map one or more local files containing access tokens as a read-only bind mounts to /mnt/secrets/
inside the Docker container like so:
./run_notebook.sh --add-secret ~/Documents/.access_tokens/TESTTOKEN1 FOO \
--add-secret ~/Documents/.access_tokens/TESTTOKEN2 BAR
You will then be able to access them inside the notebook like so:
# Inside a notebook cell, run Bash commands with a ! directive;
!cat /mnt/secrets/FOO
!cat /mnt/secrets/BAR
# Contents of the two files TESTTOKEN1 and TESTTOKEN2
SUPERSECRET_TOKEN1
API_TOKEN_FOR_ACME
A Python example is in examples/secrets_test.ipynb.
Warnings:
/usr/app/data
(e.g. as /usr/app/data/.access_tokens/
)!!!~/.access_tokens
, but rather ~/Documents/.access_tokens
, ~/Documents/.access_tokens
, or any place in the predefined subfolders below the user directory, because
/Users/yourusername/.access_tokens
!!!The current working directory will be available as /usr/app/data
from within the container. By default, it is read-only (except in the Jupyter Notebook mode). If you want to make this writeable, change the line
--mount type=bind,source=$REAL_PWD,target=/usr/app/data,readonly \
in run_script.sh
to
--mount type=bind,source=$REAL_PWD,target=/usr/app/data \
You can also mount additional local paths using the same syntax.
If you want to grant your code write-access to the src
folder in development mode permanently, you can use the option -D
, like so:
./run_script.sh -D
A common use-case is running code-formatters on the source-code. The Black Code Formatter is included in the default conda/mamba
environment. So you can use black
in the interactive development mode with write-access, like so:
./run_script.sh -D -i
$ black main.py
All done! ✨ 🍰 ✨
1 file left unchanged.
Be warned: Make sure you understand the security implications!
Note: The following problem is not relevant if you are using Docker Desktop on OSX (and, not tested), Docker Desktop on a Linux machine. It only applies to plain Docker installations, e.g. on a production server.
In order to be able to write to the output
directory within the current working directory on the host machine on a plain Docker installation on Linux, it is necessary to use UID and GID of the user inside the container.
Also, you may run into problems accessing the files in the output
folder from either the container or on the host machine if the user ID used inside the container differs from your user ID on the host system.
In run_script.sh
, we are setting the internal user's UID and GID to that of the user starting the run_script.sh
script, as long as the UID is >= 1000. This should mitigate or solve the issue.
If you run the script as a root user on the host machine, the user UID and GID are not passed for security reasons. You have to configure Docker for rootless mode, which is a good practice anyway.
docker context use rootless
sudo
group or has root privileges. Create a dedicated standard user to run the container.By default, the script inside the container has no Internet access, which makes it more challenging for malicious code to transmit harvested information etc.
Besides using the -n
option with run_script.py
, you can grant Internet access as a default by removing the line
--net none \
from run_script.sh
.
More advanced settings are possible, e.g. adding a proxy or firewall inside the container that permits access only to a known set of IP addresses or domains and / or logs the outbound traffic.
For updating the Python packages, you should re-built the respective image with -f
(for 'force'):
# Script
./build.sh -f
# Script development image
./build.sh -f -d
# Default notebook image
./build.sh -fn
# Notebook image from dataviz.yaml
./build.sh -fn dataviz
# Notebook image from openai.yaml
./build.sh -fn openai
micromamba
v
, like 2.0.2
.git checkout -b update_micromamba_x.y.z
ARG MICROMAMBA_VERSION="2.0.2"
seccomp-default.json
from https://raw.githubusercontent.com/moby/moby/refs/heads/master/profiles/seccomp/default.json../build.sh -fd
and test it with ./run_script.sh -d
. (@TODO: Better integration test).notebook
environment:
./build.sh -fn
./run_notebook.sh
notebook.yaml.lock
./build.sh -fn {mini | dataviz | openai}
./run_notebook.sh {mini | dataviz | openai}
{mini | dataviz | openai}.yaml.lock
./build.sh -f
.See commits on Github.