livepeer / stream-tester

Stream tester is a tool to measure performance and stability of Livepeer transcoding network
23 stars 11 forks source link

Orchestrator Performance Leaderboard integration #56

Closed kyriediculous closed 3 years ago

kyriediculous commented 3 years ago

Performance Leaderboard MVP

Specification

Stream-Tester

Stream-tester will be spun up in server mode for each region we want to run performance benchmarking from. This way we can utilize its REST interface to send in benchmarking requests for several orchestrators using cron-jobs.

POST /start_streams

Current request object

{
    "host": "localhost",
    "media_host": "",
    "file_name": "official_test_source_2s_keys_24pfs.mp4",
    "rtmp": 1935,
    "media": 8935,
    "repeat": 1,
    "simultaneous": 1,
    "profiles_num": 2,
    "do_not_clear_stats": false,
    "measure_latency": true,
    "http_ingest": false
}

The request will be expanded to take in the following (optional) fields:

Parameter Type Description
orchestrators []string List of orchestrators to use for selection. This list will be specified for in a broadcaster's -orchAddr flag on startup. e.g. ["34.13.56.251:8935", "12.55.39.145:89356"]
metrics []string List of promQL to collect data from the broadcaster node. e.g ["livepeer_success_rate offset 1m", "avg_over_time(livepeer_upload_time)[1m]"]

* The PromQL queries provided in the metrics field could be hardcoded as well for an MVP

When this POST request returns a 2xx code, the metrics collection can take place.

If the orchestrators field is provided and the the host field omitted stream-tester will spin up a new broadcaster instance using the orchestrators for the -orchAddr flag and -metrics=true. The process will be torn down after the stream completed and metrics are gathered.

GET /stats

Currently the /stats endpoint takes optional parameters. We can add additional optional parameters orchestrator and since (see storage section) that allows to filter stats per orchestrator and since a specific timestamp.

Region

Stream-tester will have a new optional flag on startup: -region which will take a string of the airport code (cfr. Livepeer.com API deployments) as argument. This way the stream-tester can self-identify as running in a particular region.

Current Regions:

Storage

The stream-tester will be expanded with an interface that allows it to write metrics gathered from a stream to a storage layer.

Currently the stream-tester keeps stats in memory for a manifestID but it doesn't follow a clear interface to do so. We can create a ubiquitous storage layer interface and allow the user to specify the storage driver on node startup.

-db <host>

Interface

func(db *DB) StoreStats(metrics *Metrics) error
func(db *DB) GetStats(orchestrator ethcommon.Address, since time.Time) ([]*Metrics, error)

Metrics gathering

While stream-tester gathers per-stream metrics by itself we can also have it query the broadcaster node it uses for prometheus metrics.

One caveat is that the metrics endpoint is exposed on the same port as the CLI webserver so there would need to be some CORS restrictions imposed on all CLI write endpoints. Alternatively the stream-tester would have to run on the same machine as the broadcasters that are used.

For each metric specified in the metrics field in the request body for /start_streams we will then use the broadcaster:7935/metrics endpoint to execute the provided promQL queries.

After collecting the metrics the data alongside the region (provided on stream-tester startup) will be sent to the storage API.

Field Type Description
Timestamp int64 Benchmark start time
Region string Region from which the benchmark originates
Orchestrator string Ethereum address of the tested orchestrator
Segments Sent int32 Total number of segments sent
Segments Downloaded int32 Number of segments downloaded from orchestrator
Success Rate float32 Stream succes rate calculated as segments_downloaded / segments_sent
Average segment duration int64 Average duration of a segment for the stream
Average upload time int64 Average upload time for segments from broadcaster to the orchestrator
Average upload score float64 Average upload score calculated as upload_time/segment_duration
Average download time int64 Average download time for segments from orchestrator to broadcaster
Average download score float64 Average download score calculated as download_time/segment_duration
Average transcode time int64 Average time per segment it takes for the orchestrator to transcode the segments
Average transcode score float64 Average transcoding score calculated as transcode_time/segment_duration
Errors []string A list of errors encountered

Leaderboard Backend

Having the client fetch all metrics for every region from all the seperate stream-tester nodes is not a very scalable solution.

If we use a cloud-based database for a production environment then having a server that can be an intermediate query layer is far more scalable and puts less strain on the clients (user's web browser).

Alternatively metrics can be queried in a serverless function. For example MongoDB Atlas plays really well with AWS Lambda, where a Mongo connection exists for the duration of the Lambda function: https://docs.atlas.mongodb.com/best-practices-connecting-to-aws-lambda/.

The REST interface could be fairly simple. One endpoint to get aggregated stats and one endpoint returning the raw logs.

GET /aggregated_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>

example response

{
   "<orchAddr>": {
    "MDW": {
        "score": 5.5,
      "success_rate": 91.5
    },
    "FRA": {
        "score": 2.5,
      "success_rate": 100
    },
    "SIN": {
        "score": 6.6,
            "success_rate": 93
    }
  },
   "<orchAddr2>": {
        ...
  },
    ...
}

GET /raw_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>

If no parameter for orchestrator is provided the request will return 400 Bad Request

example response

For each region return an array of the metrics from the 'metrics gathering' section as a "raw dump"

{
 "FRA": [
    {
    "timestamp": number,
        "segments_sent": number,
        "segments_received": number,
        "success_rate": number,
        "avg_seg_duration": number,
        "avg_upload_time": number,
        "avg_upload_score": number,
        "avg_download_time": number,
        "avg_download_score": number,
        "avg_transcode_time": number,
        "avg_transcode_score": number,
        "errors": Array
      }
   ],
   "MDW": [...],
   "SIN": [...]
}
darkdarkdragon commented 3 years ago

@kyriediculous You can't 'query' broadcaster on /metrics endpoint - this endpoint is just a plain text representation of internal application counters in the moment. To do queries like avg_over_time you will need to also spin up Prometheus instance that will be scraping data from broadcaster instance - then you can run queries against Prometheus.

kyriediculous commented 3 years ago

I guess we can spin up a docker container for both broadcaster and a Prometheus instance for said broadcaster programmatically ?

darkdarkdragon commented 3 years ago

Given that all this will be running inside k8s, it could be easier to spin up stream tester, prometheus and broadcaster inside one pod - so they will be always running. Then point broadcaster's orchWebhookUrl to the stream tester and make stream tester to respond with needed orchestrators. That way there will be no need to spin up/down anything programmatically.

kyriediculous commented 3 years ago

Thanks Ivan, I'm not very familiar with the webhooks in go-livepeer and never used them. I'll look into it because that seems like a better route to use then if we don't have to spin pods up everytime.

I suppose we can then use tags to filter the metrics we require (using NodeID or ManifestID ? )


Also one note to make is that the current specification for this considers the stream-tester as a general purpose tool and leaves fetching all orchestrators to run a benchmark for as an external operation if that's how a user or service wishes to use the stream tester.

i.e. We fetch the orchestrators on a different service and then provide the orchestrator we want to benchmark for in the POST request to the stream tester.

Alternatively we don't provide such an orchestrators field in the request object. Instead we specify something like "leaderboard-benchmark" which would then instruct stream-tester to fetch all orchestrators and run a test stream for each of them using the orchWebhookUrl. This feels very application specific though but would then only require a single cron-job per region.

kyriediculous commented 3 years ago

I think the orchWebhookUrl could definitely be useful

One downside seems to be the 1 minute refresh interval which is currently hardcoded. We have the option to use a fork I guess or the alternative is just to use a 1 minute interval between test runs.

I guess the workflow would look something like this:

Prerequisite: stream-tester exposes an endpoint to get orchestrators /orchestrators which will be provided to the broadcaster's -orchWebhookUrl flag

livepeer -broadcaster ... -orchWebhookUrl http://stream-tester/orchestrators
  1. Stream-tester receives POST request on /start_streams with orchestrators field populated
  2. Stream-tester sets the orchestrators from step (1) to be the response for the /orchestrators endpoint from the prerequisites
  3. Broadcaster receives the stream and requests the orchestators using the webhook URL to refresh its current working set upon receiving a stream
  4. We wait for the stream to complete transcoding as well as the refresh timeout (minus time it took transcoding) then send the response, which also indicates the caller of the endpoint can now send a new request that can utilise a different O.
yondonfu commented 3 years ago

IIUC the following modifications for stream-tester are being proposed:

What if the scope of modifications for the stream-tester was reduced to just include the configurable orch webhook server with the rest of the functionality mentioned above implemented in a separate app ("orch-tester"?) in the cmd package of this repo? This app would support the following:

[1] Coupled with the 1 minute wait before starting a test, this might sidestep the fact that latency/duration related metrics are not labeled with a manifestID or the orchestrator used right now. Those labels could be added, but there are negative Prometheus performance implications associated with high cardinality labels.

We should then be able to automate running the app at a regular interval as a k8s CronJob with the stream-tester, broadcaster and monitoring instances already running in the same k8s cluster as well.

I suggest the above because I wonder if there is an opportunity to compose stream-tester together with a broadcaster and a monitoring instance and push most of the additional functionality which seems specific for this leaderboard metrics collection use case into an additional app in the cmd package.

yondonfu commented 3 years ago

Re: Leaderboard Backend

If we use a cloud-based database for a production environment then having a server that can be an intermediate query layer is far more scalable and puts less strain on the clients (user's web browser). Alternatively metrics can be queried in a serverless function.

Sounds like a good idea. Would each region make an API call to a serverless function after each test in order to store the results of the test in the cloud DB?

The API interface that would be exposed to a client and the aggregated stats looks good. Just to clarify - the score and success_rate fields in the response for /aggregated_stats would be the average for the time period (as indicated by the since field)?

kyriediculous commented 3 years ago

Would each region make an API call to a serverless function after each test in order to store the results of the test in the cloud DB?

Each stream tester could write it directly to storage, but now that you mention it handing that off to a serverless function might actually be a better idea.

The API interface that would be exposed to a client and the aggregated stats looks good. Just to clarify - the score and success_rate fields in the response for /aggregated_stats would be the average for the time period (as indicated by the since field)?

That is correct.

Query the Livepeer subgraph for a list of all active orchestrators with their ETH addresses and their current service URI The Livepeer subgraph would need to be updated to index orchestrator service URI updates

It feels almost easier to just have this "orch-tester" service consume the go-livepeer/eth package here. Yes it requires RPC requests then but it doesn't require subgraph changes.

I suggest the above because I wonder if there is an opportunity to compose stream-tester together with a broadcaster and a monitoring instance and push most of the additional functionality which seems specific for this leaderboard metrics collection use case into an additional app in the cmd package.

If I understand correctly you would also have any prometheus queries for metrics gathering take place in this additional service then rather than the stream-tester? That seems okay to me although some of them might be useful for other usage within stream-tester as well.

yondonfu commented 3 years ago

It feels almost easier to just have this "orch-tester" service consume the go-livepeer/eth package here. Yes it requires RPC requests then but it doesn't require subgraph changes.

Using the go-livepeer/eth package will also require event indexing in order to construct the list of active orchestrator ETH addresses + service URIs (which is implemented in go-livepeer, but not in a easily exportable manner at the moment) since contract RPC calls would only give us the pending orchestrator set. I suspect adding service URI indexing to the subgraph and just querying the subgraph will be more straightforward than either re-implementing the event indexing functionality in the orch-tester or refactoring code in go-livepeer to export that functionality from there. Plus, being able to fetch the list of service URIs for active orchestrators from the subgraph seems useful in its own right which would be an added bonus.

If I understand correctly you would also have any prometheus queries for metrics gathering take place in this additional service then rather than the stream-tester?

Yep that was my suggestion.

kyriediculous commented 3 years ago

We can use go-livepeer/eth and use Client.TranscoderPool()

And for each address

While I realise this will not include orchestrators that will become deactivated in the next round , I'm wondering if benchmarking orchestrators that will soon become deactivated is useful anyway.

Admittedly it would be nice to have the service URI on the subgraph

kyriediculous commented 3 years ago

closed by #62