puckel / docker-airflow

Docker Apache Airflow
Apache License 2.0
3.77k stars 542 forks source link

Only install required pip packages #310

Open ghost opened 5 years ago

ghost commented 5 years ago

The image should only install the minimum required python pip packages to get running. I don't want to install packages that I don't need as it adds to the size, build time (which can take some time on my raspberry pi) and potentially introduces more problems.

The current release of the Dockerfile supports adding extra pip packages using variables. Docker is designed for one image to be built on another (ie FROM airflow RUN pip install whatever).

So I propose that the following packages be removed: hive, jdbc, and mysql.

Kiddinglife commented 5 years ago

as far as i knew, most people does not care the size. because we are living on the clouds.

cubranic commented 5 years ago

I don't know where you live @Kiddinglife but Docker's own best practice document says:

Inadvertently including files that are not necessary for building an image results in a larger build context and larger image size. This can increase the time to build the image, time to pull and push it, and the container runtime size.

So most authors do aim to keep the image size down and leave out unnecessary packages. You could argue whether the packages @lmcclell proposes to drop are necessary or not, but it's not fair to say image size is somehow irrelevant.