Currently, all dependencies in requirements.txt are installed for all containers. But not all of them are needed in each container. pyspark in particular, which is a large library (~250 MB), is used only in the loader container.
Using setup.py to manage installation of requirements, instead of doing pip install -r requirements.txt in each container, would allow greater flexibility, so that each container could install only the reqs it needs. (This would mainly benefit development, though it would also save space on prod.)
Currently, all dependencies in
requirements.txt
are installed for all containers. But not all of them are needed in each container.pyspark
in particular, which is a large library (~250 MB), is used only in theloader
container.Using
setup.py
to manage installation of requirements, instead of doingpip install -r requirements.txt
in each container, would allow greater flexibility, so that each container could install only the reqs it needs. (This would mainly benefit development, though it would also save space on prod.)