theskumar / python-dotenv

Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-factor principles.
https://saurabh-kumar.com/python-dotenv/
BSD 3-Clause "New" or "Revised" License
7.7k stars 431 forks source link

Enhance `dotenv run`: Switch to `execvpe` for better resource management and signal handling #522

Closed eekstunt closed 4 months ago

eekstunt commented 4 months ago

The current implementation of dotenv run CLI uses subprocess.Popen, which spawns a child process to execute the specified command.

p = Popen(command,
              universal_newlines=True,
              bufsize=0,
              shell=False,
              env=cmd_env)

After spawning the child process, it exits with the same exit code returned by the child process.

ret = run_command(commandline, dotenv_as_dict)
exit(ret)

We can enhance dotenv run usage dramatically while preserving exactly the same behaviour

By switching to os.execvpe instead of subprocess.Popen, we can replace the parent dotenv process with the new process specified by the user. This results in only one active process—the program the user intended to run.

Benefits:

  1. No hanging parent process dotenv run acts as a launcher, so after executing dotenv run redis-server, only the Redis server process remains. The dotenv process, along with its Python interpreter, is completely replaced. This prevents the dotenv process from consuming RAM and other resources, which would otherwise persist until the Redis server exits.

  2. Proper signal handling When using subprocess.Popen, the parent process (e.g., dotenv) remains responsible for handling and forwarding signals, which can lead to issues if the command doesn’t receive them directly. For instance, in Docker, if Redis was started without exec, it may not get important signals like SIGTERM when the container stops, potentially resulting in improper shutdowns or zombie processes. Using os.execvpe ensures that the command receives signals directly, improving reliability and making dotenv more suitable for production environments and improving reliability for DevOps engineers managing containerized applications.

All current logic will be preserved because dotenv run does not do anything special except propagate the child process exit code.

I don't see any downsides of replacing current implementation with exec. Let's exec it :)

If you have any questions or concerns, I would be glad to discuss them.

eekstunt commented 4 months ago

Created PR #523. All tests are passing

earlgreyness commented 4 months ago

It might be worth mentioning the /usr/bin/env program here, which has historically served a very similar purpose as dotenv run. Its main intended usage is launching an arbitrary program modifying its environment before the launch (hence the name), for instance:

env EDITOR=nano git config --global user.email "you@example.com"

You might argue why not use something like TZ=Asia/Jakarta journalctl -f, but this particular syntax is only available inside a shell environment (Bourne shell, bash, zsh, etc) and cannot be used as a separate command directly. This will fail:

subprocess.run(["TZ=Asia/Jakarta", "journalctl", "-n10"])

since there is no TZ=Asia/Jakarta executable inside PATH. And that is precisely the kind of situation where env is actually useful (no access to shell, whether by choice or by necessity).

More practical modern example of course includes using /usr/bin/env as part of the shebang directive inside shell/python scripts to locate the first mentioned program instance inside PATH:

#!/usr/bin/env python3
import sys
print(sys.__version__)

This pattern is so prevalent that there are probably many millions of scripts using it being launched every day.

So the most important thing to point out here is that env uses exec to launch commands since it would be wasteful to have a running env process for every shell script being forked by it instead.

https://github.com/coreutils/coreutils/blob/72588b291594c762133090ef24fcad8894fc91f6/src/env.c#L921

Since dotenv (in my opinion) falls into exactly the same category as env, being a system administration/devops small but very useful utility program, it would only be logical to restore the semantics parity for dotenv run command with env implementation and use exec pattern instead of fork to spawn processes.

By the way, this shebang variation may be the best combination of both tools I can think of:

#!/usr/bin/env -S dotenv run python3
import os
print(os.environ)

It automatically locates the Python virtual environment AND loads environment variables from .env file before interpreting the actual script with the correct intended Python version. The -S option is required in this case.