Closed michaelarnauts closed 3 years ago
I support this view...
In the past it was also referred to me that we should not even add large python requirements such as numpy or scipy.
Here is a top 50 of installed large python packages, this is excluding large debian packages that are installed somewhere else.
root@ec5a707d8ce2:/usr/local/lib/python3.6/site-packages# du -s * | sort -nr | head -n 50
370548 tensorflow
59616 numpy
28640 pycountry
28184 grpc
28092 botocore
27804 twisted
27112 babel
25660 hass_frontend
20704 Cryptodome
20692 Crypto
18540 hass_frontend_es5
17316 uvloop
16116 sphinx
13860 lxml
13492 twilio
11672 youtube_dl
11156 dlib.cpython-36m-x86_64-linux-gnu.so
9980 Cython
9300 sphinx_rtd_theme
8960 sqlalchemy
8284 h5py
7976 psycopg2
7792 pip
7136 netaddr
6884 PIL
6880 google
6680 cryptography
6264 gcloud
6224 pygments
6124 tensorboard
5528 aiohttp
5152 slixmpp
4952 sleekxmpp
4868 pyowm
4828 libopenzwave.cpython-36m-x86_64-linux-gnu.so
3980 python_openzwave
3860 docutils
3500 passlib
3200 future
3184 pysnmp
2992 graphql
2780 ephem
2772 pytz
2660 __pycache__
2556 telegram
2500 asyncssh
2236 rx
2156 git
2120 tests
2080 selenium
It has been added to Hass.io as well: https://github.com/home-assistant/hassio-homeassistant/pull/15/files
Especially on Raspberry Pi's... I think this is a really bad idea.
Well, we removed it four days ago and added this into an Add-on.
This problem is what we need challenging with Docker, and therefore it exists not a way. The Add-on solution with local dept folder is not practicable and currently only a workaround.
I think the end solution is what @frenk does with community add-ons and extended packages on startup.
Anyway, all solution will be at the end on Hass.io and not a full docker only solution because it needs extended logic and a system they can manage this for users.
I can see how we might need a policy to prevent excess bloat. At the same time, storage is relatively cheap.
Home Assistant, when installed via the venv, actually does not install all dependancies out of the gate. Both Hass.io and the Docker image do, however. I am sure this was done because pre-installing and bundling with the docker image takes a lot less time for the user than trying to have the component install on demand.
I can see both sides of this argument. Wherever this ends up, I think simplicity is more important than size for the average user. If a power user want a slimmed down version, it is pretty easy to build a container that doesn't pre-install any of the dependancies.
I agree with @pvizeli that moving it out of the base image on Hass.io is fine, but a shared deps folder is a work-around. You can't do that with the official docker image though. There is no shared "add-on", so you would have to go into the shell of the official image to install additional deps every time you pulled a new version, which again is prioritizing image size over ease-of-use.
@frenck - Why do you think having it on a Pi is a bad idea? The framework is being used in a ton of projects for the Pi, including self driving RC cars.
@hunterjm I think what @frenck meant is the general practice of adding large dependencies to the docker image, not specifically to tensorflow.
In general, i see the home assistant image as the one for "power users" that install the image themselves, so they know how docker works and are able to add additional containers if needed. the hass.io image makes it all simple and allows adding additional add-ons.
Well in case of Tensorflow, this can be an external container (add-on or just Docker), which solves a large part probably.
But there is a general issue to discuss/solve here @michaelarnauts, which is not just Tensorflow. With the current development speed of Home Assistant, the images will keep growing. This requires a different approach in order to handle this growth. Tensorflow is just the one triggering it now.
@frenck - I agree. The balance to strike is size vs simplicity. The add-on will work for Hass.io since Home Assistant uses a local deps folder that the add-on can write to. Something similar can be done with the base docker image by having a secondary image to pull that mounts the same deps directory to install the wheel into, but that is not close to user friendly.
To keep the images smaller, the practice of auto-installing requirements_all.txt
can be stopped as well. If the docker images do not automatically install requirements_all.txt
, and let the components hot load the requirements, large libraries like tensorflow can be added back to the REQUIREMENTS
list, and the add-ons can be used to satisfy the requirement for distributions that do not have a pre-compiled wheel. At that point, since the base image is on Ubuntu, a wheel would not be required for the HA docker image.
I do not think the version of tensorflow in the official docker image even supports the raspberry pi anyway.
I just tried it on a pre-Sandy Bridge i5 and it tells me that TensorFlow was compiled to use AVX, but those aren't available on this particular CPU.
I don't have it on a Pi right now, but I do know the Pi does not support AVX either....
edit: I should make a separate issue for this, but I'll leave this here since it may impact near-term decisions on this.
@hunterjm that is not full correctly. To install all inside requirements_all
it needs 150MB. To have all compiler preinstalled with the libraries, it needs 250MB.
So I see this solution for Hass.io and in future could be we use the Hass.io containers also as official one.
Instead of a blanket removal of 'bloat' binaries/dependencies, why not use the power of Docker Hub to allow the user to choose what they want?
Builds can be configured to publish to tags in Docker Hub - Why not leave binaries/dependencies in for the 'latest' Docker build so that the average user will have a fully working HA but also have a 'lite' tag for stock Home Assistant without the bloat binaries/dependencies for more advanced users?
Because you need a building farm to deploy all these architectures in a usefully time. But I'm open for contributions to a build system they can run distributed and provide this functionality.
Next problem is. You want component X and component Y. But they are spitting now of two docker images. Or you want vlc and need now use the extend docker images of 1Gb for only a binary of 40mb.
The only way to be flexible is to have a slim image and allow do install additional apks/wheel on container build.
Dockerhub does exactly that. It builds the images for you. CPU time is sponsored by Docker I guess.
But I prefer the suggestion of @pvizeli , a slim image, that can install dependencies on demand.
You could use a volume to store those downloaded extra's outside the container (on the host), so you don't need to download them again when you update or recreate your container.
Just a few comments from me:
Yes, docker build system is only for amd64(i386). It's possible to use cross-build with qemu but to build a full armhf image that takes 12-18 hours. In the free variant, you can also only have one build at the time. We can invest the money into a docker enterprise account, but that solves not the issue with armhf/aarch64 and future architectures.
As next, we build for every IoT device an own image, based on the release image build that is basing on the base image. That allows us to cache several layers. So we build for every release 16 images.
For allow to customize on container layer, I see no issue with docker layering. The container layer is created to do things like that. So if we can have a base that supports 95% - 98% of components with 200mb, that should be fine. Also if we can cache the first 150mb and the user need only download around 50-60mb per release update.
An advantage of a non-Docker addon system is that it should be easier for users to substitute their own packages....depending on how such a system was implemented of course.
For example, a user could use a custom build of tensorflow that supports their CPU. Currently, tensorflow is installed in the dockerfile via a pip install tensorflow
. If this wasn't docker, a user could pip install tensorflow_gpu-1.8.0-cp36-cp36m-win_amd64.whl
or whatever the hypothetical addon system required.
@balloob since you've approved the tensorflow inclusion, what is your view on this?
I guess this should be the exact reason why hass.io exists, so people can install those addon-software easily, and why people should want to use that instead of a normal installation of hass.
I approved Tensorflow just to temporarily unblock the release. I do hope that we can get to a better solution . As you can already see these closed PRs lining up referring this issue, it's something that we should decide on a course and stick to it.
I also think that we should limit this discussion to Home Assistant and not Hass.io.
Maybe we should provide two images, home-assistant and home-assistant-build, the second one containing the build tools so people can build their own stuff. Any package that is over 1MB will have to be installed on demand. Some of those might require the home-assistant-build
image.
And yes, maybe some things will end up not working on either of our images, and I think that is fine. If we try to integrate with every service and device in the world out there, it cannot be expected from us to create a container that is compatible with everything.
I do not have much knowledge of what HA needs for python modules. but I saw that the docker file uses python 3.6 uses
FROM python:3.6
Isn't it possible to use either:
FROM python:3.6-slim
or the more secure:
FROM python:3.7.2-slim
Or even a lot smaller (which I prefer!)
FROM python:3.7.2-alpine3.8
Shrinks down the docker a lot.
FWIW:
I normally use alpine:3.8
directly for my python projects, but in HA's case i know of a few modules it would actually break. E.g. there is an ongoing issue with the version of OpenSSL in alpine being too old to support X25519 which is needed by things using the latest release of aiossh (so breaks auswrt). Also means homekit_controller doesn't work, too.
I guess the build times would also be slower as it wouldn't be able to use wheels.
Also, I've had cases where musl (alpine) was a lot slower versus libc (normal version). We also depend in some debian packages that might not be in the alpine repos.
In this case, I would prefer to stick to the non-alpine version, but the python-slim variant could be interesting. Although this issue is really to discuss how we want to handle large dependencies for specific integrations or features.
musl is with GCC 8 not so slow as with GCC 4 (old alpine).
We stick on alpine with Hass.io because they support a lot more CPU architecture and musl perform on some hardware better as glibc. There is no real issue with musl and speed.
Some experience from me... I experimentally build an image with HA on alpine (1.07GB vs. 2.24GB). Worked without problems with the components I use 👍 Until the day it seemed to stuck in an endless loop sending iOS push messages (worked a few days without issues before). I do not know of this was caused by alpine in a way but since then I switched back to the debian image...
@pvizeli okay, that's nice! My experience was from over a year ago, so it might have changed since then. And since hass.io containers are alpine, that means that it should work fine for most users (I assume hass.io has a large userbase compared to the normal docker images?)
Hi all,
Was there ever any consensus what to do about about a Tensorflow Addon? To me, the obvious solution for hass.io is to run tensorflow in a docker container and just use a gRPC stub (or perhaps REST interface) in the hass.io container to access it. The stub would be very small so wouldn't cause size concerns.
Theoretically, different docker containers could be created for GPU's etc (indeed the official docker contaners already come in GPU and non GPU variants).
I'd really like to see tensorflow in Hass.io, so would like to see some progress on this. Happy to help if I can.
I thought of the same thing as I have NAS i Can run tensor flow on.
@scstraus I think if you can make a PR to enable grpc for the tensor flow component, maybe as an option in case it has overhead then that would be a great start!
I certainly wouldn't mind doing that, but I don't think I'd be able to write the integration properly to homeassistant as I don't understand the internals well enough (I have looked at writing components in the past and while actually getting some code to do what I want wasn't difficult, getting it to fit into the hass architecture was a bit out of my league). Would it do to open a PR just with some sample code?
@scstraus since we already have tensorflow to integrated you wouldn't need to "Fit into the hass architecture" since thats already done. Take a look at https://github.com/home-assistant/home-assistant/blob/master/homeassistant/components/image_processing/tensorflow.py and look for tensorflow references and try replace it with the corresponding grpc variation and that would be a great start.
I suggest you open a new PR for this and we can work through it there. even if it doesn't make it in on first go it would be an interesting experiment IMO.
Good point. It doesn't look as hard as I was expecting, no PyPi component which helps, one less thing to learn. I have a clean hass.io install on Debian server that I can use. I'll start setting up the dockers for a test environment and see how far I can get in creating a gRPC version. From what I can understand from here, I shouldn't open a PR until I have something working, so it seems I should hold off on that for now? Maybe I should work from a fork until then?
To do a PR you need a fork anyway :) if you get something moving post a url and I’ll happily help/try out what I can.
I've started a thread to discuss my progress (or lack thereof). If anyone feels like following along and giving (much appreciated) advice in the hopes of dockerizing a tensorflow component, head over here:
This architecture issue is rather old and still open :)
In the meantime, we have migrated to a single Docker structure for all installation types that use Docker (including the previously called Hass.io).
Images are now based on Alpine, and have been reduced in size. More optimizations for that are on the roadmap but are out of scope of the original issue presented here.
I consider therefore this issue solved, for now, especially considering it was based on an image that we don't have anymore 😉
What about HACS style approach ? load components as HACS does
I would like to open the discussion on the docker image, since I'm a bit worried about the growing size of the docker image, and the policy we have for adding things to the docker image.
Recently, tensorflow was added to the docker image (https://github.com/home-assistant/home-assistant/pull/18191 and https://github.com/home-assistant/home-assistant/pull/17795#issuecomment-435331879) causing it to grow with 400MB compared to 0.81.
Tensorflow folder in the docker image:
Dockerhub compresses the layers, but it's still a 100MB extra download. (https://hub.docker.com/r/homeassistant/home-assistant/tags/)
Now, in my opinion, the official Home Assistant docker image should contain everything to make Home Assistant work (read: python libraries), but not all the binaries needed for all the components. This can lead to several gigabytes. I'm thinking about opencv, tensorflow, ssocr here.
Since hass.io is our "customer-friendly" way of running home assistant, I think those dependencies should be handled as hass.io addons, with a different docker image.
This way, the official docker image can be kept as small as possible (we are still aiming on rpi users) and users needing object recognition and other fancy new stuff can just run the hass.io addon.