Open ghost opened 5 years ago
as far as i knew, most people does not care the size. because we are living on the clouds.
I don't know where you live @Kiddinglife but Docker's own best practice document says:
Inadvertently including files that are not necessary for building an image results in a larger build context and larger image size. This can increase the time to build the image, time to pull and push it, and the container runtime size.
So most authors do aim to keep the image size down and leave out unnecessary packages. You could argue whether the packages @lmcclell proposes to drop are necessary or not, but it's not fair to say image size is somehow irrelevant.
The image should only install the minimum required python pip packages to get running. I don't want to install packages that I don't need as it adds to the size, build time (which can take some time on my raspberry pi) and potentially introduces more problems.
The current release of the Dockerfile supports adding extra pip packages using variables. Docker is designed for one image to be built on another (ie FROM airflow RUN pip install whatever).
So I propose that the following packages be removed: hive, jdbc, and mysql.