Closed slifty closed 4 years ago
My first idea would be Dockerize appliances and run them as containers, but you'd need orchestration for that and down that path lies Kubernetes aka madness. So, my status on this is very much 🤔
I have a feeling that at some point we may have to go down that path; especially if appliances get more complex and start needing their own wild and crazy dependencies (which they will almost instantaneously need).
Oh snap, appliances are going to need their own dependencies!
@reefdog and I had a conversation to talk through this stuff. I'm going to attempt to summarize some of it:
Wow I did a bad job summarizing...
Appliances are going to regularly require OS level dependencies, and so Chris's point about Docker might actually be critical to address very soon. For instance, the caption extraction appliance is going to rely on ccextractor
.
~We would probably want to modify IAppliance to provide an install
method that would allow an appliance author to specify installation commands.~ EDIT: Or is that not realistic / should we just rely on doctor
as described below.
IAppliance
would not need to change for docker -- rather, we would create a docker-appliance repository that takes an appliance package, runs its install, and then modify tv-kitchen
to work via Docker instead of running const Appliance = import(dynamic-package-here)
and new Appliance
yarn add @tv-kitchen/appliance-foo @tv-kitchen/appliance-bar
" based on dynamic examination of the config + some magical dependency-testing seems fine/good.Just some thoughts after our Slack chats.
Also IAppliance might also need doctor
.
@reefdog points out that IAppliance might also need doctor
omg.
But yes for real: it would probably be a Very Good Idea that each appliance define a doctor
method that tested and confirmed all its external dependencies (other packages, OS binaries, etc.) were installed and accessible. Not sure when this should be run. (During appliance setup? During parent app boot? As an explicit diagnostic?)
OK so some thinking.
I think we should toss out the idea of hot-installing appliances during runtime, especially considering setup for an appliance will require installation of tools (e.g. ffmpeg, tesseract, ccextractor, imagemagick, whatever)
I am thinking we should allow appliances to be loaded in two modes (docker or directly) and the mode should be specified in the config.json
or whatever. For short term we won't implement this, and will only support direct. Later we will move to the madness Chris describes and update the CountertopWorkers
to interact with appliances in either mode.
Appliances can't provide install
since we don't know what environment an appliance is running in. doctor
will be the key to providing information about what dependencies an appliance relies on and whether those dependencies are met. Once we Dockerize the appliances, appliances can also include a docker file. (Note: beamcoder seemed to install ffmpeg, right? That made me anxious, but it was a thing I believe... I'd rather provide instructions to the dev on how to install, rather than try to install directly)
We still have a choice around package.json
that I don't know what to do about just yet.
does an appliance need to be on the same machine, what if it can be just webhooked something? (maybe we can have some appliance proxy docker thingie, etc. that can connect a remote appliance)
@Laurian good question; the non-docker mode is assuming the same machine, but I think the mid term vision should support remote. Would this be a natural part of the Docker / Kubernetes question Chris raised?
Having slept on it here is what I'd like to propose:
Update IAppliance
to require a doctor
method in Appliances, which verify the presence of external dependencies.
Have the countertop coordinator check for the existence of Appliance packages AND run the appliance doctor when loading a toplogy / recipe, outputting instructions to the user about whatever needs installing (or whatever doctor errors exist).
Rely on the dev to install necessary appliance packages and their related requirements.
NOTE: This means that the core tv-kitchen repository is not going to be our recommended entry point, and that we are going to suggest that people import tvk from 'tv-kitchen/core'
in implementation repositories.
As a result of this, we would also eventually:
tv-kitchen
repository to core
.core
repository as a package called @tvkitchen/core
dockerized-core
repository which would serve as a dockerized way to run tv kitchen.cli
repository which would provide a command line interface for running tv kitchen.These last items could be done later; for now we could just continue with the tv-kitchen
entrypoint for development and just be careful not to commit appliance package dependencies.
I think this is mostly a good medium term plan. Basically you're proposing to do the API part first and then build a CLI on it, which is a good approach.
The CLI and Docker parts seem a little hand-wavey, but may be okay for now, though there's the danger of locking ourselves into a path dependent future where technical and partner debt forecloses designs based on those entry points. That's why I started work on Docker stuff now, so that it's something we factor in early. It's sometimes just good to cut to the chase.
I think the idea about Dockerizing OS-level dependencies was a good one and one we want to revisit.
What might a Docker version of this design look like? Each component: core (or CLI), appliances, etc. would have its own image. You'd use Compose or a Kubernetes Helm Chart to spin those all up and environment variables to configure them to talk to each other. Perhaps the Compose or Chart would be a template or part of one.
Just going straight Docker may be easier to run than a bare project with a bunch of unmanaged dependencies that recommends Docker to run one of those dependencies. But I am not a Docker partisan, it is not the only way.
One additional thought that I've had for a long time is that much of image processing is command line and much of data science is in Python. Opening the door to a polyglot approach will extend the utility. Theoretically you could do that with Python talking to Kafka, but I wonder if having a Docker appliance that hooks up named input/output pipes in a container isn't the way to go.
I'm going to leave this issue open because it's still open -- but I'm spinning out a new issue for defining the API!
Discussion
What do you want to talk about?
Appliances!
Appliances are package-based, but we're just using npm instead of our own package manager because we are not insane (I swear).
The question is: what should the officially documented mechanism be for loading appliances.
Some considerations as we think about the right balance:
We (probably) don't want appliance modules to be reflected in package.json, since they're secondary packages and not necessarily dependencies.
We may eventually have a GUI / API based configuration.
Probably other things too lol idk.
Some approaches:
npm install
necessary (note: yarn doesn't allow you to install a package without affecting package.json)Relevant Resources / Research
Relevant to issue #23