How should appliances be installed?

slifty commented 4 years ago

Discussion

What do you want to talk about?

Appliances!

Appliances are package-based, but we're just using npm instead of our own package manager because we are not insane (I swear).

The question is: what should the officially documented mechanism be for loading appliances.

Some considerations as we think about the right balance:

We (probably) don't want appliance modules to be reflected in package.json, since they're secondary packages and not necessarily dependencies.
We may eventually have a GUI / API based configuration.
Probably other things too lol idk.

Some approaches:

Make a manual npm install necessary (note: yarn doesn't allow you to install a package without affecting package.json)
Have the TV Kitchen install the package automatically based on the configuration.

Relevant Resources / Research

Relevant to issue #23

chriszs commented 4 years ago

My first idea would be Dockerize appliances and run them as containers, but you'd need orchestration for that and down that path lies Kubernetes aka madness. So, my status on this is very much 🤔

slifty commented 4 years ago

I have a feeling that at some point we may have to go down that path; especially if appliances get more complex and start needing their own wild and crazy dependencies (which they will almost instantaneously need).

Oh snap, appliances are going to need their own dependencies!

@reefdog and I had a conversation to talk through this stuff. I'm going to attempt to summarize some of it:

We should not have runtime modify package.json, that's crazy.
There is value to having the appliance dependencies codified in a way that other devs / environments can replicate.
There's a niceness to being able to just say "I want THIS APPLIANCE in my topology" and the system going and setting it up for you.

Wow I did a bad job summarizing...

Appliances are going to regularly require OS level dependencies, and so Chris's point about Docker might actually be critical to address very soon. For instance, the caption extraction appliance is going to rely on ccextractor.

~We would probably want to modify IAppliance to provide an install method that would allow an appliance author to specify installation commands.~ EDIT: Or is that not realistic / should we just rely on doctor as described below.

IAppliance would not need to change for docker -- rather, we would create a docker-appliance repository that takes an appliance package, runs its install, and then modify tv-kitchen to work via Docker instead of running const Appliance = import(dynamic-package-here) and new Appliance

reefdog commented 4 years ago

Runtime modification of package.json is weird and will probably (rightfully) scare people.
Runtime dependency check that halts execution and says "To continue, please yarn add @tv-kitchen/appliance-foo @tv-kitchen/appliance-bar" based on dynamic examination of the config + some magical dependency-testing seems fine/good.
But☝️ would result in the modification of the package.json of the calling/parent app (whether a totally fresh application that imported TVK, or an instance of TVK based on a skeleton template or TVK clone, or whatever). I would argue that's good because our appliances are Node packages, with things like semver-based version numbers, that it would be Good to have a proper package manager coordinate. Also this way the appliances are now explicit dependencies for the calling/parent app, which they are. (This, I think is how Babel/ESLint essentially do it?)
Following on ☝️, we could also have the runtime init check for any installed appliances that aren't in the current config and suggest removing them, in the case of "we changed the topology and are no longer using appliance-foo, but it's currently installed".

Just some thoughts after our Slack chats.

Also IAppliance might also need doctor.

slifty commented 4 years ago

@reefdog points out that IAppliance might also need doctor

omg.

reefdog commented 4 years ago

But yes for real: it would probably be a Very Good Idea that each appliance define a doctor method that tested and confirmed all its external dependencies (other packages, OS binaries, etc.) were installed and accessible. Not sure when this should be run. (During appliance setup? During parent app boot? As an explicit diagnostic?)

slifty commented 4 years ago

OK so some thinking.

I think we should toss out the idea of hot-installing appliances during runtime, especially considering setup for an appliance will require installation of tools (e.g. ffmpeg, tesseract, ccextractor, imagemagick, whatever)
I am thinking we should allow appliances to be loaded in two modes (docker or directly) and the mode should be specified in the config.json or whatever. For short term we won't implement this, and will only support direct. Later we will move to the madness Chris describes and update the CountertopWorkers to interact with appliances in either mode.
Appliances can't provide install since we don't know what environment an appliance is running in. doctor will be the key to providing information about what dependencies an appliance relies on and whether those dependencies are met. Once we Dockerize the appliances, appliances can also include a docker file. (Note: beamcoder seemed to install ffmpeg, right? That made me anxious, but it was a thing I believe... I'd rather provide instructions to the dev on how to install, rather than try to install directly)
We still have a choice around package.json that I don't know what to do about just yet.

Laurian commented 4 years ago

does an appliance need to be on the same machine, what if it can be just webhooked something? (maybe we can have some appliance proxy docker thingie, etc. that can connect a remote appliance)

slifty commented 4 years ago

@Laurian good question; the non-docker mode is assuming the same machine, but I think the mid term vision should support remote. Would this be a natural part of the Docker / Kubernetes question Chris raised?

slifty commented 4 years ago

Having slept on it here is what I'd like to propose:

Update IAppliance to require a doctor method in Appliances, which verify the presence of external dependencies.
Have the countertop coordinator check for the existence of Appliance packages AND run the appliance doctor when loading a toplogy / recipe, outputting instructions to the user about whatever needs installing (or whatever doctor errors exist).
Rely on the dev to install necessary appliance packages and their related requirements.

NOTE: This means that the core tv-kitchen repository is not going to be our recommended entry point, and that we are going to suggest that people import tvk from 'tv-kitchen/core' in implementation repositories.

As a result of this, we would also eventually:

Consider renaming the tv-kitchen repository to core.
Release the core repository as a package called @tvkitchen/core
Create a new instance template repository (maybe an actual GitHub template, I'm not sure).
Create a new dockerized-core repository which would serve as a dockerized way to run tv kitchen.
Create a new cli repository which would provide a command line interface for running tv kitchen.

These last items could be done later; for now we could just continue with the tv-kitchen entrypoint for development and just be careful not to commit appliance package dependencies.

chriszs commented 4 years ago

I think this is mostly a good medium term plan. Basically you're proposing to do the API part first and then build a CLI on it, which is a good approach.

The CLI and Docker parts seem a little hand-wavey, but may be okay for now, though there's the danger of locking ourselves into a path dependent future where technical and partner debt forecloses designs based on those entry points. That's why I started work on Docker stuff now, so that it's something we factor in early. It's sometimes just good to cut to the chase.

I think the idea about Dockerizing OS-level dependencies was a good one and one we want to revisit.

What might a Docker version of this design look like? Each component: core (or CLI), appliances, etc. would have its own image. You'd use Compose or a Kubernetes Helm Chart to spin those all up and environment variables to configure them to talk to each other. Perhaps the Compose or Chart would be a template or part of one.

Just going straight Docker may be easier to run than a bare project with a bunch of unmanaged dependencies that recommends Docker to run one of those dependencies. But I am not a Docker partisan, it is not the only way.

chriszs commented 4 years ago

One additional thought that I've had for a long time is that much of image processing is command line and much of data science is in Python. Opening the door to a polyglot approach will extend the utility. Theoretically you could do that with Python talking to Kafka, but I wonder if having a Docker appliance that hooks up named input/output pipes in a container isn't the way to go.

slifty commented 4 years ago

I'm going to leave this issue open because it's still open -- but I'm spinning out a new issue for defining the API!

tvkitchen / countertop