livepeer / grants

⚠️ DEPRECATED ⚠️ Please visit the new homepage at https://grants.livepeer.org
43 stars 7 forks source link

Wish List: Infrastructure management tools #3

Open dob opened 4 years ago

dob commented 4 years ago

The Problem Orchestrators/Transcoders currently have the base layer functionality available via the Livepeer node to set up an Orchestrator/Transcoder setup, but this falls far short of an "easy to manage" toolset to give them visibility and management capability over their cluster.

Potential Solutions Open source tools that let infrastructure operators easily view the status of their various Orchestrator and Transcoder processes, let them add and remove transcoders from their cluster, let them adjust pricing on the fly, let them get alerts when things go down or there are required protocol interactions pending, etc would be hugely helpful to those running infrastructure on the Livepeer network.

This probably combines writing some wrapper code on the raw functionality of the Livepeer node software, along with creating an interface for visibility and management of the infrastructure.

Challenges All the APIs exist to do this, either in Livepeer world through the node APIs and smart contract interactions, or through publicly available infrastructure management tools like Docker, Kubernetes, etc...but the devops work of piecing them together, scripting them, and creating interfaces on top can be a challenge.

Summary If people propose building open source tools that are valuable to the infrastructure operator community here, this is an area that we would love to support with grants.

kebab-mai-haddi commented 3 years ago

Hi @dob ! Thank you for articulating the issue so well.

I am interested in this project and I think I can develop something around here. Before moving further, I want to feel the whole scenario and see what I can do.

I have been in DevOps for years, health-check infrastructure is my major interest. Have worked on a few grants before as well.

But I am new to LPT. I just ran a local orchestrator and a transcoder and did some basic stuff like checking their status and all. I want to reproduce the complete scenario - clusters of orchestrators and transcoders running and then I go and check the status of various processes, adding/removing transcoders, (MAYBE) adjusting prices on the fly, and setting up health alerts for the cluster, etc

Is there any testing cluster that I can use?

dob commented 3 years ago

I'm guessing there's not a test cluster set up for this, but I bet there's some info in the way of blueprints you could follow if you were setting up your own cluster. Paging @iameli who may be able to point you in the right direction.

iameli commented 3 years ago

Hi @kebab-mai-haddi! Awesome that you're interested. A lot has changed infrastructurally for Livepeer in the year+ since this proposal was written. Here are two big pieces of infrastructure that could potentially be interesting to you:

  1. We're looking to release an up-to-date version of our Kubernetes Helm chart soon - I can probably do that now, actually, though it's not terribly well-documented. This is the main thing that backs most of the Livepeer.com infrastructure right now, and it's capable of running on-chain broadcasters, orchestrators, and even the API node backing the REST API for defining streams and whatnot. It's a logical starting point for any kind of "scaled Livepeer deployment tools" project. But most of its cool features are tailored toward those running scaled broadcasters rather than orchestrators.

  2. There's also the monitoring supercontainer, which is a combined Prometheus/Grafana Docker container capable of monitoring and delivering statistics. Very useful for running Os and checking on ticket redemptions and that sort of thing.

I think both of these are pieces of the puzzle but they don't necessarily answer all of @dob's original post:

Open source tools that let infrastructure operators easily view the status of their various Orchestrator and Transcoder processes, let them add and remove transcoders from their cluster, let them adjust pricing on the fly, let them get alerts when things go down or there are required protocol interactions pending, etc

So I think there's still definitely interest for that sort of thing, especially in the "setting prices on lots of Os at once" area. I'd be curious to get your take on that.

We don't presently have a test cluster set up for this, but depending on what'd be necessary for your particular grant proposal we can discuss how we could help in that area. (A proper Livepeer test cluster is an interesting idea, though — I'm a big fan of the CNCF's Community Infrastructure Lab.)

kebab-mai-haddi commented 3 years ago

Thank you for your response guys! @iameli , I am interested in all the three project ideas that you have mentioned and I will go through them and revert to you guys asap!

github-actions[bot] commented 11 months ago

This issue has been marked as stale with no activity. It will close in 7 days.

Strykar commented 8 months ago

Hi @dob, is this path too radical for Livepeer to support via grant for infra. mgmt.?

Create video as a separate work type inside backend.ai, and Livepeer, within it, as a web configurable work load - https://backend.ai/ https://github.com/lablup/backend.ai/

If you think it may be worthwhile to explore, I will reach out to their developers?

AuthorityNull commented 8 months ago

Hi @dob, is this path too radical for Livepeer to support via grant for infra. mgmt.?

Create video as a separate work type inside backend.ai, and Livepeer, within it, as a web configurable work load - https://backend.ai/ https://github.com/lablup/backend.ai/

If you think it may be worthwhile to explore, I will reach out to their developers?

I was chatting with Stykar about backend.ai and I'd love to hear feedback from more O's and the core team on whether or not it's worth pursuing.

dob commented 8 months ago

@Strykar Could you elaborate a little bit more on the workflow that you envision for a backend.ai user? What type of task would they be looking to perform, what would be their interface to performing it, and how would the Livepeer network plug in? Thanks!

Strykar commented 8 months ago

Sure @dob, here's a possible list of features an Orchestrator web interface for Livepeer may be expected to have over time:

If we use an existing OSS AI/ML GPU hyperscaler orchestration project like Backend.ai, we would not have to deal with reinventing the wheel / maintenance of any of the above features.

This will require two prior Public Goods grants -

  1. Enable an API for all Orchestrator functions exposed via livepeer / livepeer-gpu and their configs
  2. Enable livepeer to live-reload its own configuration via -HUP without dropping streams

This grant then could:

  1. Add livepeer / livepeer-gpu as new container images that can be spun up in a pre-configured environment and system administered via Backend.ai's existing framework.
  2. Create a livepeer Orchestrator specific Grafana dashboard with multiple inputs from Dune or other O's

This relatively small effort would enable all of the features listed above with zero maintenance burden on the grant.

All the features above are now available to any Orchestrator irrespective of size, in something they can install on Docker Desktop on their home PC or bare metal in data centers.

Possible future benefits:

These short videos give a sense of the features this grant would enable - https://www.backend.ai/product/webui https://www.backend.ai/product/control-panel https://www.backend.ai/product/dashboard

Or do you feel an Orchestrator management project is better off developed centered around Livepeer and a custom one-off?

dob commented 8 months ago

It's a little abstract to me, due to not being in the weeds of node operation. I think the meaningful signal here would be if O's actually wanted and saw the benefit in this. Any O's care to chime in with some specific examples of how this would concretely help your day-to-day?