haudiobe / 5G-Video-Content

Documents the development of 5G Video Content Hosting
0 stars 0 forks source link

Reported AV1 Issue: Some participants observed repeated frames and possible content dependent encoding #28

Open johnsim opened 2 years ago

johnsim commented 2 years ago

The following request has been made over AV1 discussion email earlier:

Q1: Can you also please clarify on resulting coding order, if it is different from hierarchical prediction utilized in MPEG codecs, (e.g., if any skipped frames, repeated frames, overlay frames usage)?

The document S4aV210828 answers as following:

A: When hierarchical reference structure is enabled, the encoding order is similar to MPEG codecs. Av1 uses dummy frames with show_existing_frame=1, to indicate the frame indexed by frame_to_show_map_idx is to be output at decoding time.

Some participants conducted AV1 simulations with suggested for Scenario 1 and 2 encoder parameters and observed non-regular coding order on several occasions.

In particular, resulting bitstreams for sequence Rainfruit of Scenario 1 (Sc1_S02) feature different number of coded frames, with different numbers of frames being repeated for different QPs.

That would violate the TR requirements of Clause 5.6: “Only fixed periodic (temporal) QP and coding structures are permitted”.

Q2: Can you please provide information on the content of repeated frames?

Q3: What is the encoding process utilized to decide on coding of repeated frames, is this process QP or content dependent?

Q4: Are there any other content-dependent processes in AV1 encoder that alter the coding and prediction structure?

johnsim commented 2 years ago

If indeed the number of coded frames is different for each QP and sequence, then that would violate the TR requirements of Clause 5.6, where it is mentioned that “Only fixed periodic (temporal) QP and coding structures are permitted and this should be investigated".

As we mentioned in our email and on the call, This is not what we observed using the VQ Analyzer. we did not notice any variability in the number of coded frames.

a) Do you still observe different number of frames being coded for each sequence and for each QP?

b) AV1 uses a combination of different flags in the bitstream to achieve a hierarchical coding structure. In particular there are the flags _show_existingframe, _showframe, and _showableframe, which may make it appear that some frames are “repeated”.

Is this what is meant here by “repeated” frames? A clarification would be appreciated.

johnsim commented 2 years ago

Q2: Can you please provide information on the content of repeated frames?

In the AV1 specification, the following frame header syntax elements indicate to the decoder whether or not the decoded frame should be output/displayed immediately after decoding.

For a frame encoded out of its natural display order (typically a future reference, or AltRefframe),the encoder has three options. Using a GOP size=32 as an example, the first frame (POC=0) will be coded as a Key (Intra) frame. The second frame (POC=32) will be coded as an out of order Inter frame. The encoder has the following three options:

  1. The encoder may encode frame 32 as a normal Inter frame with show_frame=0 to indicate that it should not be output as soon as it is decoded, and showable_frame=1 to indicate that it will be output sometime in the future. Later in the bitstream, the encoder may encode a “dummy frame” with show_existing_frame=1 and set frame_to_show_map_idx to point to the previously decoded frame 32. When a decoder encounters this dummy frame, it will output the previously decoded frame 32. The dummy frame has a fixed size of about 3 bytes.
  2. Rather than encoding source frame 32 the encoder may instead encode a frame that results from applying a 5-frame temporal filter centered on frame 32, that is, to the two frames immediately prior and the two frames immediately following source frame 32, and source frame 32 itself. The temporal filtered frame is then encoded as per (1) above.
  3. The encoder may produce a temporally filtered frame 32 and encode this instead of source frame 32 with show_frame=0 to indicate that the decoded frame will not be output immediately after being decoded, and showable_frame=0 to indicate that the decoded frame will only be used as a reference frame and will never be output by the decoder. Later in the bitstream, the encoder may encode the original source frame 32 as an Inter frame, using only the filtered AltRef frame as a reference, with show_existing_frame=0, and show_frame=1 to indicate that the decoded frame will be output immediately after it is decoded. In this case two (non-dummy) frames are encoded for source frame 32, but the second of these (known as the “Overlay” frame) is usually very small. When the decoder encounters this frame, it will decode and output the overlay frame instead of the filtered AltRef.

In option 1 and 2, the dummy frame will be shown as a “repeated” frame in the bitstream analyzer. In option 3, the overlay frame will be shown as a normal “inter” frame. In both cases, the decoder will always decode the AltRef frame.

In both cases, the total number of frames (counting both “real” frames and “dummy” frames) in the bitstream will be the same. It is determined by the GOP size and number of source frames. For example, if there are N frames in the input sequence, half of them will be encoded in their natural display order, and the other half will be encoded out of display order. The total number of frames (including both “real” frames and “dummy” frames) will be approximately (exactly if an integer number of GOPs are encoded) 1+(N-1)1.5. That is, the first key frame plus 1.5GOP_size frames per complete GOP.

image

As an example, the screen shot shown above corresponds to the file S1-A027-AV1_27.obu, and the table below shows the encoding order, frame type, and frame header flags for the first few frames.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Encoding Order | Display Order /  POC (order hint) | Frame Type | Flags | Decoder behavior -- | -- | -- | -- | -- 0 | 0 | Key (Intra) | show_existing_frame=0 show_frame=1 | Display the decoded frame 1 | 32 | Inter | show_existing_frame=0 show_frame=0 showable_frame=1 | Decode the frame but do not display 2 | 16 | Inter | show_existing_frame=0 show_frame=0 showable_frame=1 | Decode the frame but do not display 3 | 8 | Inter | show_existing_frame=0 show_frame=0 showable_frame=1 | Decode the frame but do not display 4 | 4 | Inter | show_existing_frame=0 show_frame=0 showable_frame=1 | Decode the frame but do not display 5 | 2 | Inter | show_existing_frame=0 show_frame=0 showable_frame=1 | Decode the frame but do not display 6 | 1 | Inter | show_existing_frame=0 show_frame=1 | Display the decoded frame 7 |   | Dummy frame | show_existing_frame=1 frame_to_show_map_idx=5 | Display the previously decoded frame (corresponding to POC=2) 8 | 3 | Inter | show_existing_frame=0 show_frame=1 | Display the decoded frame

image

In another example, the screen shot shown above corresponds to file S1-A15-AV1_63.obu, and the table below shows the encoding order, frame type and frame header flags for the frame 96.

<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Encoding Order | Display Order /  POC (order hint) | Frame Type | Flags | Decoder behavior -- | -- | -- | -- | -- 97 | 96 | Inter | show_existing_frame=0 show_frame=0 showable_frame=0 | Decode the frame but do not display. This is the AltRef 144 | 96 | Inter | show_existing_frame=0 show_frame=1 | Decode the frame and display. This is the overlay frame that uses only the above AltRef frame as reference.

johnsim commented 2 years ago

Q3: What is the encoding process utilized to decide on coding of repeated frames, is this process QP or content dependent? In the current encoding configuration, only the AltRef in temporal layer 0 is coded as described by Option 3 above, utilizing both a temporally filtered AltRef frame and an overlay frame. That is, for GOP size=32, only frames with POC=32, 64,... are coded using temporal filtering. All other AltRef frames are encoded as per Option 1 described above, that is without use of either temporal filtering or an overlay frame.

The decision whether to code the overlay frame or not is determined by the difference between the original source frame and the temporally filtered frame. If the difference is higher than a threshold (determined by QP), Option 3 is used. Otherwise, an overlay frame is not used.

We only recently determined that this decision to encode an overlay frame is in fact content dependent. We will fix this in an update to the AV1 configuration.

johnsim commented 2 years ago

Q4: Are there any other content-dependent processes in AV1 encoder that alter the coding and prediction structure?

We recently determined the decision whether to encode an overlay frame is in fact content dependent. We will fix this in an update to the AV1 configuration.

See commentary answer to Q2 for additional information.

johnsim commented 2 years ago

A correction on the answer to Q4 After further discussion we have understood that using a frame header with _show_existingframe=1 instead of an overlay frame when the residual energy is low is just an efficient way to code a "zero" overlay frame. This process is a fundamental coding tool of AV1.