livepeer / bounties

Livepeer Software Bounties Portal.
https://livepeer.org
3 stars 1 forks source link

Grafana Dashboard for Gateway Metrics Bounty [$500] #49

Open JJassonn69 opened 1 month ago

JJassonn69 commented 1 month ago

Overview

To enhance observability and performance monitoring of the AI Subnet, we are implementing a Grafana Dashboard for Gateway Metrics. This dashboard offers critical insights into the AI-job broadcasting operations performed by the Gateway, which relays AI-job requests to orchestrators across the network. Livepeer Cloud currently provides a free-to-use gateway, enabling users to test the network's capabilities. Metrics from this pull request are already being collected by the public gateway and displayed on a Grafana dashboard, accessible to the entire community. 🚀

This dashboard also provides orchestrators with an approximate view of the full network traffic, given its status as the largest gateway. Additionally, Livepeer Cloud is developing a more comprehensive system-wide metrics portal, which can be explored further here.

We are calling on the community to help implement this crucial part which increases the visibility of network activity. The implementation of this dashboard is crucial for monitoring and optimizing various aspects of the AI-job requests and orchestrator performance, ensuring that the ai-subnet operates efficiently and effectively. 🔥


Required Skillset


Bounty Requirements

  1. Implementation: Develop a user-friendly Grafana Dashboard that is easy to set up and tailored for Gateways within the network. Following create a pull request to the Livepeer grafana dashboards repository.

  2. Functionality: The dashboard should include the following metrics:

    • ai_models_requested: Number of requests per model per pipeline.
    • ai_request_latency_score: Latency score per pipeline to assess orchestrator performance. This metric should indicate the average time taken for a request (e.g., a 1024x1024 image with 25 time steps).
    • ai_request_price: The price per unit charged by orchestrators for processing jobs.
    • ai_request_errors: The number of errors per pipeline, providing insight into the reliability of the pipelines.
    • Ticket_value_sent: The value of the AI tickets sent to the Orchestrators.
    • Tickets_send: The total number of tickets sent to the Orchestrators.

These dashboards will provide comprehensive insights into the public gateway requests on the AI network, enabling better optimization and resource allocation.


Scope Exclusions


Implementation Tips

To understand how to work with Gateway metrics, you can refer to a recent pull request that deals with related functionality:

Pull Request #3087

Additionally, make sure to:


How to Apply

  1. Express Your Interest: Comment on this issue to indicate your interest and explain why you're the ideal candidate for the task.
  2. Wait for Review: Our team will review expressions of interest and select the best candidate.
  3. Get Assigned: If selected, we'll assign the GitHub issue to you.
  4. Start Working: Dive into your task! If you need assistance or guidance, comment on the issue or join the discussions in the #developer-lounge channel on our Discord server.
  5. Submit Your Work: Create a pull request in the relevant repository and request a review.
  6. Notify Us: Comment on this GitHub issue when your pull request is ready for review.
  7. Receive Your Bounty: We'll arrange the bounty payment once your pull request is approved.
  8. Gain Recognition: Your valuable contributions will be showcased in our project's changelog.

Thank you for your interest in contributing to our project! 💛

[!WARNING] Please wait for the issue to be assigned to you before starting work. To prevent duplication of effort, submissions for unassigned issues will not be accepted.

rickstaa commented 1 month ago

This was implemented by @stronk-dev in https://github.com/livepeer/grafana-dashboards/pull/1 🙏🏻.

stronk-dev commented 1 month ago

This was implemented by @stronk-dev in livepeer/grafana-dashboards#1 🙏🏻.

Not quite - the dashboard I published is for Orchestrator node operators. For gateways we'll want to omit machine info and probably stick to more detailed panels for pipelines/models. They'll certainly have some shared panels and of course I'll be happy to take this one on