nanobox-io / nanobox-engine-python

Engine for running Python apps on Nanobox
https://nanobox.io
MIT License
12 stars 13 forks source link

Proposal for automatically installing OS packages based on Python requirements #41

Open jjkester opened 6 years ago

jjkester commented 6 years ago

At the moment the requirements.txt file is checked for certain package names. This is not ideal as sometimes packages are implicitly installed as a dependency of another package. Not everyone likes to always pip freeze > requirements.txt, instead, people specify their direct dependencies including bounding versions and let the other packages resolve themselves.

I have thought of a way to do this better, allowing for both scenarios outlined above. There are basically three steps to this:

  1. Let pip figure out the dependencies. Part of this process is to download them (this is the only way of really knowing the dependencies of a package).
  2. Analyze the packages that were downloaded and install the required OS packages. Whether this should be hard coded or provided some other way has to be decided.
  3. Install the downloaded Python packages from the cache.

1. Downloading the Python packages

For this step, pip download -r requirements.txt -d /tmp/python-packages/ should come in handy. The output of the command can be processed as for every package the output seems consistent in some format, which is listed below. The destination directory has to be specified via command line arguments, otherwise the packages are placed in the current directory.

Collecting {{package_name}}
  Downloading {{file_name}}
    (progress bar)
  Saved {{destination_directory}}/{{file_name}}
Collecting {{package_name}} (from {{dependent_package_name}})
  Downloading {{file_name}}
    (progress bar)
  Saved {{destination_directory}}/{{file_name}}

2. Analyzing the downloaded packages

From the format above the exact package names can be extracted. Given these package names the OS dependencies can be looked up in some system (to be specified). These packages can be installed.

3. Installing the Python packages

Now we should have all the dependencies (at least of the packages that are known), so it is safe to install the packages with pip install -r requirements.txt -f /tmp/python-packages/. This makes sure that the already downloaded packages are used instead of downloading them again.

benspaulding commented 6 years ago

If this is something Nanobox wants to pursue, Pipenv has a way to both import and export a requirements file. Maybe that would simplify getting the requirements.

> ls
requirements.txt

> cat requirements.txt
django
gevent
gunicorn
psycopg2

> pipenv lock --requirements | grep -oP "^\S+==\S+(?=\s--hash=.*)"
(... pipenv stderr clipped for brevity ...)
greenlet==0.4.12
pytz==2017.3
gevent==1.2.2
gunicorn==19.7.1
psycopg2==2.7.3.2
django==1.11.6