yondonfu commented 5 years ago

The purpose of this issue is to organize discussion around approaches to reduce the time/cost of source decoding in a self-verification workflow. In this issue, orchestrator (O) self-verification workflow will refer to the process where O will self-verify each transcoded result using the classifier prior to returning the result to the broadcaster (B). O's behavior when self-verification for a result fails is outside the scope of this issue.

A simple, naive self-verification workflow would look something like this:

O receives source
O transcodes source and produces result
O decodes source and extracts features
O decodes result and extracts features
O runs prediction using features from the source and the result

One of the big problem with this workflow is the added latency/computation. The added latency/computation is derived from:

Source decode
Source feature extraction
Result decode
Result feature extraction
Prediction

We know that prediction is not a huge concern based on the results presented in #7. We also know that we can drop the time/cost of feature extraction by reducing the # of frames used (exact numbers to be included in #36). Thus, the biggest concern ends up being source and result decoding.

37 explores an approach to avoid result decoding by directly extracting features from the encoded bitstream so mitigating the cost of result decoding is outside the scope of this issue and we will instead focus on mitigating the cost of source decoding.

First, let's observe that in the naive workflow, O will decode the source twice: once during and once after transcoding. As a result, this workflow includes a redundant decoding step.

One way to eliminate the extra decoding step is to access the source decoded frames produced by the decoder during transcoding before the decoded frames are preprocessed and passed to the encoder. To minimize the interruption to transcoding, perhaps the required # of frames can be sampled and those frames can be copied and passed to a separate thread. The upside to this approach is that it would completely remove any latency/cost associated with an extra source decoding step. The downside is that this approach requires a modification to the transcoding pipeline. Furthermore, the modification to the transcoding pipeline becomes more complex if both SW & HW transcoding is supported - if the entire transcoding pipeline occurs in a GPU, then the GPU would need to know how to sample and copy decoded frames that are passed back to the CPU.

An alternative approach that doesn't completely eliminate the extra decoding step is to have O decode and extract features from the source in parallel with transcoding instead of waiting until it finishes transcoding. The upside of this approach is that the transcoding latency is reduced relative to the naive workflow because source decoding and feature extraction do not add time before a result can be returned to B. The downside of this approach is that while it reduces transcoding latency, it does not reduce the overall added computational cost since the extra source decoding step still occurs.

j0sh commented 5 years ago

O will self-verify each transcoded result using the classifier prior to returning the result to the broadcaster (B)

Is this a settled point, or can the self-verification be done asynchronously? Can the results be returned immediately, with the signature whenever verification completes?

37 explores an approach to avoid result decoding by directly extracting features from the encoded bitstream

Note that if we work on the compressed bitstream for the transcoded video, we can also work on the compressed bitstream for the source video. However, as https://github.com/livepeer/verification-classifier/issues/37 notes, the approach is not promising -- there is not enough signal in the compressed bitstream alone to act as features, and selectively un-compressing the data is tantamount to re-writing an entire decoder.

Performing the verification within the transcoding pipeline on uncompressed frames requires careful evaluation since it could constrain GPU-enabled machines that would otherwise be capable of processing more streams. https://github.com/livepeer/verification-classifier/issues/7 is helpful, but I think we need to more completely understand the full verification workflow and how things may behave within that (eg, how often does the initialization step need to run?).

yondonfu commented 5 years ago

Is this a settled point, or can the self-verification be done asynchronously? Can the results be returned immediately, with the signature whenever verification completes?

Not a settled point, but IF B does not insert a result into its playlist unless it has verified the signature for the result then it seems that async self-verification with immediate result submission and delayed signature submission switches the point in time of latency addition from when O returns a result to when B inserts the result in its playlist (B would wait until it verifies the signature to use the result). See livepeer/research#32 for a description of why we wouldn't want B to insert a result into its playlist unless it has verified the signature for the result - open to hearing arguments against this though!

Performing the verification within the transcoding pipeline on uncompressed frames requires careful evaluation since it could constrain GPU-enabled machines that would otherwise be capable of processing more streams. #7 is helpful, but I think we need to more completely understand the full verification workflow and how things may behave within that

Yeah just wanted to mention it as a possibility. I think some exact numbers from #36 could help us understand whether or not modifying the transcoding pipeline could make sense. Until then, the best option at the moment seems to be having O decode and extract features from the source in parallel with transcoding instead of waiting until it finishes transcoding.

how often does the initialization step need to run?

IIUC, the initialization step described in #7 = source decode + source feature extraction. So, we would need to run the initialization step once for each input segment.

ndujar commented 5 years ago

Actually, a possible improvement over the mentioned naive workflow mentioned:

O receives source O transcodes source and produces result O decodes source and extracts features O decodes result and extracts features O runs prediction using features from the source and the result Can be: O receives source O transcodes source and produces result(s) together with source's numpy array version of random frames O decodes result(s) and produces numpy array version of same frames O extracts features for each result O runs prediction(s) using features from the source and the result(s) This may be achieved in ffmpeg by supplying several outputs to the command line, one of them being image2pipe. The advantage of this approach is that we avoid the overhead of reading the source twice, which is normally the most time-consuming part of creating the numpy arrays. The extraction of features can then be optimized as the number of random frames can be adjusted without apparent loss of accuracy.

As a drawback, one could argue whether prescribing a tool (ffmpeg) for doing the transcoding is positive or not. On the other hand, those who want to use their own transcoding tools may still need to subject themselves to some form of standard.

yondonfu commented 5 years ago

This may be achieved in ffmpeg by supplying several outputs to the command line, one of them being image2pipe. The advantage of this approach is that we avoid the overhead of reading the source twice, which is normally the most time-consuming part of creating the numpy arrays.

Interesting. So, you could use image2pipe with one of the ffmpeg outputs which can then be passed to a program that converts the piped input into numpy arrays for the frames. This would avoid reading the source twice, but is the source still decoded twice?

ndujar commented 5 years ago

Interesting. So, you could use image2pipe with one of the ffmpeg outputs which can then be passed to a program that converts the piped input into numpy arrays for the frames. This would avoid reading the source twice, but is the source still decoded twice?

In this scenario, no, the source is only decoded once. Then ffmpeg uses the decoded information to generate the transcoded versions and the numpy array at the same time

ndujar commented 5 years ago

As a side note to consider, the size of a file containing 60 frames at a 960x540px resolution in numpy array format (.npy) is 31MB. Compressed in tar.xz format size goes down to 6.2MB. We would need at least two of these files (one for original and one for each copy) to be sent around for a verification to take effect if we want to avoid re-decoding. Smaller sizes are also possible with random sampling in a direct proportion to the number of sampled frames. So, for example, with 5 samples size shrinks down to 2.6MB without compressing.

livepeer / verification-classifier

Reducing the time/cost of source decoding #39

37 explores an approach to avoid result decoding by directly extracting features from the encoded bitstream so mitigating the cost of result decoding is outside the scope of this issue and we will instead focus on mitigating the cost of source decoding.

37 explores an approach to avoid result decoding by directly extracting features from the encoded bitstream