kyriediculous commented 3 years ago

Performance Leaderboard MVP

Specification

Stream-Tester

Stream-tester will be spun up in server mode for each region we want to run performance benchmarking from. This way we can utilize its REST interface to send in benchmarking requests for several orchestrators using cron-jobs.

`POST` /start_streams

Current request object

{
    "host": "localhost",
    "media_host": "",
    "file_name": "official_test_source_2s_keys_24pfs.mp4",
    "rtmp": 1935,
    "media": 8935,
    "repeat": 1,
    "simultaneous": 1,
    "profiles_num": 2,
    "do_not_clear_stats": false,
    "measure_latency": true,
    "http_ingest": false
}

The request will be expanded to take in the following (optional) fields:

Parameter	Type	Description
orchestrators	`[]string`	List of orchestrators to use for selection. This list will be specified for in a broadcaster's `-orchAddr` flag on startup. e.g. `["34.13.56.251:8935", "12.55.39.145:89356"]`
metrics	`[]string`	List of promQL to collect data from the broadcaster node. e.g `["livepeer_success_rate offset 1m", "avg_over_time(livepeer_upload_time)[1m]"]`

* The PromQL queries provided in the metrics field could be hardcoded as well for an MVP

When this POST request returns a 2xx code, the metrics collection can take place.

If the orchestrators field is provided and the the host field omitted stream-tester will spin up a new broadcaster instance using the orchestrators for the -orchAddr flag and -metrics=true. The process will be torn down after the stream completed and metrics are gathered.

`GET` /stats

Currently the /stats endpoint takes optional parameters. We can add additional optional parameters orchestrator and since (see storage section) that allows to filter stats per orchestrator and since a specific timestamp.

Region

Stream-tester will have a new optional flag on startup: -region which will take a string of the airport code (cfr. Livepeer.com API deployments) as argument. This way the stream-tester can self-identify as running in a particular region.

Current Regions:

MDW (Chicago)
FRA (Frankfurt am Main)
SIN (Signapore)

Storage

The stream-tester will be expanded with an interface that allows it to write metrics gathered from a stream to a storage layer.

Currently the stream-tester keeps stats in memory for a manifestID but it doesn't follow a clear interface to do so. We can create a ubiquitous storage layer interface and allow the user to specify the storage driver on node startup.

-db <host>

If no flag is specified, memory storage will be used.
If the host starts with the mongo:// (local) mongo+srv:// (atlas) prefix we can safely assume we'll need to use MongoDB for the storage driver.
If neither of the above, use Postgres as the DB driver, when using a local postgres DB a filesystem path will be provided, otherwise a proxy URL.

Interface

func(db *DB) StoreStats(metrics *Metrics) error
func(db *DB) GetStats(orchestrator ethcommon.Address, since time.Time) ([]*Metrics, error)

Metrics gathering

While stream-tester gathers per-stream metrics by itself we can also have it query the broadcaster node it uses for prometheus metrics.

One caveat is that the metrics endpoint is exposed on the same port as the CLI webserver so there would need to be some CORS restrictions imposed on all CLI write endpoints. Alternatively the stream-tester would have to run on the same machine as the broadcasters that are used.

For each metric specified in the metrics field in the request body for /start_streams we will then use the broadcaster:7935/metrics endpoint to execute the provided promQL queries.

After collecting the metrics the data alongside the region (provided on stream-tester startup) will be sent to the storage API.

Field	Type	Description
Timestamp	int64	Benchmark start time
Region	string	Region from which the benchmark originates
Orchestrator	string	Ethereum address of the tested orchestrator
Segments Sent	int32	Total number of segments sent
Segments Downloaded	int32	Number of segments downloaded from orchestrator
Success Rate	float32	Stream succes rate `calculated as segments_downloaded / segments_sent`
Average segment duration	int64	Average duration of a segment for the stream
Average upload time	int64	Average upload time for segments from broadcaster to the orchestrator
Average upload score	float64	Average upload score calculated as upload_time/segment_duration
Average download time	int64	Average download time for segments from orchestrator to broadcaster
Average download score	float64	Average download score calculated as download_time/segment_duration
Average transcode time	int64	Average time per segment it takes for the orchestrator to transcode the segments
Average transcode score	float64	Average transcoding score calculated as transcode_time/segment_duration
Errors	[]string	A list of errors encountered

Leaderboard Backend

Having the client fetch all metrics for every region from all the seperate stream-tester nodes is not a very scalable solution.

If we use a cloud-based database for a production environment then having a server that can be an intermediate query layer is far more scalable and puts less strain on the clients (user's web browser).

Alternatively metrics can be queried in a serverless function. For example MongoDB Atlas plays really well with AWS Lambda, where a Mongo connection exists for the duration of the Lambda function: https://docs.atlas.mongodb.com/best-practices-connecting-to-aws-lambda/.

The REST interface could be fairly simple. One endpoint to get aggregated stats and one endpoint returning the raw logs.

`GET /aggregated_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`

If orchestrator is not provided the response will include aggregated scores for all orchestrators
If `region is not provided all regions will be returned in the response. "GLOBAL" would be included as well which would be the average of the regions.
If since is not provided we will default to something sensible, for example 24h.

example response

{
   "<orchAddr>": {
    "MDW": {
        "score": 5.5,
      "success_rate": 91.5
    },
    "FRA": {
        "score": 2.5,
      "success_rate": 100
    },
    "SIN": {
        "score": 6.6,
            "success_rate": 93
    }
  },
   "<orchAddr2>": {
        ...
  },
    ...
}

`GET /raw_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`

If no parameter for orchestrator is provided the request will return 400 Bad Request

example response

For each region return an array of the metrics from the 'metrics gathering' section as a "raw dump"

{
 "FRA": [
    {
    "timestamp": number,
        "segments_sent": number,
        "segments_received": number,
        "success_rate": number,
        "avg_seg_duration": number,
        "avg_upload_time": number,
        "avg_upload_score": number,
        "avg_download_time": number,
        "avg_download_score": number,
        "avg_transcode_time": number,
        "avg_transcode_score": number,
        "errors": Array
      }
   ],
   "MDW": [...],
   "SIN": [...]
}

darkdarkdragon commented 3 years ago

@kyriediculous You can't 'query' broadcaster on /metrics endpoint - this endpoint is just a plain text representation of internal application counters in the moment. To do queries like avg_over_time you will need to also spin up Prometheus instance that will be scraping data from broadcaster instance - then you can run queries against Prometheus.

kyriediculous commented 3 years ago

I guess we can spin up a docker container for both broadcaster and a Prometheus instance for said broadcaster programmatically ?

darkdarkdragon commented 3 years ago

Given that all this will be running inside k8s, it could be easier to spin up stream tester, prometheus and broadcaster inside one pod - so they will be always running. Then point broadcaster's orchWebhookUrl to the stream tester and make stream tester to respond with needed orchestrators. That way there will be no need to spin up/down anything programmatically.

kyriediculous commented 3 years ago

Thanks Ivan, I'm not very familiar with the webhooks in go-livepeer and never used them. I'll look into it because that seems like a better route to use then if we don't have to spin pods up everytime.

I suppose we can then use tags to filter the metrics we require (using NodeID or ManifestID ? )

Also one note to make is that the current specification for this considers the stream-tester as a general purpose tool and leaves fetching all orchestrators to run a benchmark for as an external operation if that's how a user or service wishes to use the stream tester.

i.e. We fetch the orchestrators on a different service and then provide the orchestrator we want to benchmark for in the POST request to the stream tester.

Alternatively we don't provide such an orchestrators field in the request object. Instead we specify something like "leaderboard-benchmark" which would then instruct stream-tester to fetch all orchestrators and run a test stream for each of them using the orchWebhookUrl. This feels very application specific though but would then only require a single cron-job per region.

kyriediculous commented 3 years ago

I think the orchWebhookUrl could definitely be useful

One downside seems to be the 1 minute refresh interval which is currently hardcoded. We have the option to use a fork I guess or the alternative is just to use a 1 minute interval between test runs.

I guess the workflow would look something like this:

Prerequisite: stream-tester exposes an endpoint to get orchestrators /orchestrators which will be provided to the broadcaster's -orchWebhookUrl flag

livepeer -broadcaster ... -orchWebhookUrl http://stream-tester/orchestrators

Stream-tester receives POST request on /start_streams with orchestrators field populated
Stream-tester sets the orchestrators from step (1) to be the response for the /orchestrators endpoint from the prerequisites
Broadcaster receives the stream and requests the orchestators using the webhook URL to refresh its current working set upon receiving a stream
We wait for the stream to complete transcoding as well as the refresh timeout (minus time it took transcoding) then send the response, which also indicates the caller of the endpoint can now send a new request that can utilise a different O.

yondonfu commented 3 years ago

IIUC the following modifications for stream-tester are being proposed:

Allow additional fields to be specified in the /start_streams request
Allow additional fields to be specified in the /stats request
Make it region aware by specifying a CLI flag
Enable data persistence in longer term storage by specifying a CLI flag and adding a database integration
Make it aware of a Prometheus server connected to a broadcaster in order to collect the broadcaster's Prometheus metrics
Expose an orch webhook server that can be used by a broadcaster for discovery that also supports requests to change the orchs returned as suggested by Ivan

What if the scope of modifications for the stream-tester was reduced to just include the configurable orch webhook server with the rest of the functionality mentioned above implemented in a separate app ("orch-tester"?) in the cmd package of this repo? This app would support the following:

Connect to an existing stream-tester instance
Connect to an existing broadcaster which uses the orch webhook server exposed by the stream-tester
Connect to an existing monitoring instance (which would contain the Prometheus server that scrapes metrics from the broadcaster)
Query the Livepeer subgraph for a list of all active orchestrators with their ETH addresses and their current service URI
- The Livepeer subgraph would need to be updated to index orchestrator service URI updates
For each active orchestrator:
- Tell the stream-tester orch webhook server to return the orchestrator's current service URI
- Wait 1 minute (current hardcoded cache refresh interval on the broadcaster) to make sure that the broadcaster will cache the new orch webhook response
- Tell stream-tester to start the test with the broadcaster
- When stream-tester is done, ask the Prometheus server in the monitoring instance for the relevant metrics over the last N seconds/minutes where N is the time since the start of the test [1]
  - Store the collected metrics

[1] Coupled with the 1 minute wait before starting a test, this might sidestep the fact that latency/duration related metrics are not labeled with a manifestID or the orchestrator used right now. Those labels could be added, but there are negative Prometheus performance implications associated with high cardinality labels.

We should then be able to automate running the app at a regular interval as a k8s CronJob with the stream-tester, broadcaster and monitoring instances already running in the same k8s cluster as well.

I suggest the above because I wonder if there is an opportunity to compose stream-tester together with a broadcaster and a monitoring instance and push most of the additional functionality which seems specific for this leaderboard metrics collection use case into an additional app in the cmd package.

yondonfu commented 3 years ago

Re: Leaderboard Backend

If we use a cloud-based database for a production environment then having a server that can be an intermediate query layer is far more scalable and puts less strain on the clients (user's web browser). Alternatively metrics can be queried in a serverless function.

Sounds like a good idea. Would each region make an API call to a serverless function after each test in order to store the results of the test in the cloud DB?

The API interface that would be exposed to a client and the aggregated stats looks good. Just to clarify - the score and success_rate fields in the response for /aggregated_stats would be the average for the time period (as indicated by the since field)?

kyriediculous commented 3 years ago

Would each region make an API call to a serverless function after each test in order to store the results of the test in the cloud DB?

Each stream tester could write it directly to storage, but now that you mention it handing that off to a serverless function might actually be a better idea.

The API interface that would be exposed to a client and the aggregated stats looks good. Just to clarify - the score and success_rate fields in the response for /aggregated_stats would be the average for the time period (as indicated by the since field)?

That is correct.

Query the Livepeer subgraph for a list of all active orchestrators with their ETH addresses and their current service URI The Livepeer subgraph would need to be updated to index orchestrator service URI updates

It feels almost easier to just have this "orch-tester" service consume the go-livepeer/eth package here. Yes it requires RPC requests then but it doesn't require subgraph changes.

I suggest the above because I wonder if there is an opportunity to compose stream-tester together with a broadcaster and a monitoring instance and push most of the additional functionality which seems specific for this leaderboard metrics collection use case into an additional app in the cmd package.

If I understand correctly you would also have any prometheus queries for metrics gathering take place in this additional service then rather than the stream-tester? That seems okay to me although some of them might be useful for other usage within stream-tester as well.

yondonfu commented 3 years ago

It feels almost easier to just have this "orch-tester" service consume the go-livepeer/eth package here. Yes it requires RPC requests then but it doesn't require subgraph changes.

Using the go-livepeer/eth package will also require event indexing in order to construct the list of active orchestrator ETH addresses + service URIs (which is implemented in go-livepeer, but not in a easily exportable manner at the moment) since contract RPC calls would only give us the pending orchestrator set. I suspect adding service URI indexing to the subgraph and just querying the subgraph will be more straightforward than either re-implementing the event indexing functionality in the orch-tester or refactoring code in go-livepeer to export that functionality from there. Plus, being able to fetch the list of service URIs for active orchestrators from the subgraph seems useful in its own right which would be an added bonus.

If I understand correctly you would also have any prometheus queries for metrics gathering take place in this additional service then rather than the stream-tester?

Yep that was my suggestion.

kyriediculous commented 3 years ago

We can use go-livepeer/eth and use Client.TranscoderPool()

And for each address

check if it is active
get the serviceURI

While I realise this will not include orchestrators that will become deactivated in the next round , I'm wondering if benchmarking orchestrators that will soon become deactivated is useful anyway.

Admittedly it would be nice to have the service URI on the subgraph

kyriediculous commented 3 years ago

closed by #62

livepeer / stream-tester

Orchestrator Performance Leaderboard integration #56

Performance Leaderboard MVP

Specification

Stream-Tester

`POST` /start_streams

`GET` /stats

Region

Storage

Metrics gathering

Leaderboard Backend

`GET /aggregated_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`

`GET /raw_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`

livepeer / stream-tester

Orchestrator Performance Leaderboard integration #56

Performance Leaderboard MVP

Specification

Stream-Tester

POST /start_streams

GET /stats

Region

Storage

Metrics gathering

Leaderboard Backend

GET /aggregated_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>

GET /raw_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>

`POST` /start_streams

`GET` /stats

`GET /aggregated_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`

`GET /raw_stats?orchestrator=<orchAddr>&region=<Airport_code>&since=<timestamp>`