livepeer / go-livepeer

Official Go implementation of the Livepeer protocol
http://livepeer.org
MIT License
538 stars 169 forks source link

Productionize Soccer Stream Detection #2394

Closed thomshutt closed 1 year ago

cyberj0g commented 1 year ago

Done once metrics PR is merged. To have scene classification on production and staging:

  1. Run the Bs with -metricsPerStream and -detectContent flags
  2. Pass stream configuration from Studio or API
      Livepeer-Transcode-Configuration: {"detection": {"freq": 2, "sampleRate": 10, "sceneClassification": [{"name": "soccer"},{"name":"adult"}]}}'
  3. Update sample dashboard with query:
    sort_desc(avg by(manifest_id) (livepeer_segment_scene_class_prob{seg_class_name="soccer"} > 0.5))
  4. Configure alerts based on above (to be addressed)
cyberj0g commented 1 year ago

Scene classification is deployed to RKV region. PR which adds missing dependencies to docker images: https://github.com/livepeer/go-livepeer/pull/2695 Infra PR: https://github.com/livepeer/livepeer-infra/pull/1134

cyberj0g commented 1 year ago

When planning for enabling content detection for all streams processed on GPUs, we must consider increase in video memory consumption. We already researched that, but worth briefly summarizing again here.

Without content detection

One transcoding session without content detection consumes about 222 Mb of VRAM, and that's uniform. Therefore, to get the video memory dictated max number of transcoding sessions, we can just divide VRAM amount on per-session consumption. For the 8 Gb card, it gives roughly 36 sessions, which is usually above the transcoding performance bottleneck.

With content detection

With content detection enabled, 2400 Mb of VRAM is allocated for CUDA and CuDNN runtime libraries and then shared among all CUDA sessions. There's no known way of reducing that amount. Each transcoding session additionally include the content detection model itself, and adds 350 Mb of VRAM. Thus, the 8 Gb card will be able to run just 16 transcoding sessions, so we may be at risk of getting OOM errors more frequently, if -maxSessions parameter doesn't account for that with regards to pod hardware.

cyberj0g commented 1 year ago

Ready to go. Requires -detectContent on OTs and -metricsPerStream on Bs to identify streams in the monitoring dashboard.