scottlamb / retina

High-level RTSP multimedia streaming library, in Rust
https://crates.io/crates/retina
Apache License 2.0
244 stars 48 forks source link

annexb support #44

Open lattice0 opened 2 years ago

lattice0 commented 2 years ago

I just created this conversion function and it worked on ffmpeg for decoding (at least it shows the camera image, gotta see with movement though, it's static because it's pointed to somewhere random)

/// Converts from AVCC format to annex b format
pub fn avcc_to_annex_b_cursor(
    data: &[u8],
    nal_units: &mut Vec<u8>,
) -> Result<(), AnnexConversionError> {
    let mut data_cursor = Cursor::new(data);
    let mut nal_lenght_bytes = [0u8; 4];
    while let Ok(bytes_read) = data_cursor.read(&mut nal_lenght_bytes) {
        if bytes_read == 0 {
            break;
        }
        if bytes_read != nal_lenght_bytes.len() || bytes_read == 0 {
            return Err(AnnexConversionError::NalLenghtParseError);
        }
        let nal_length = u32::from_be_bytes(nal_lenght_bytes) as usize;
        nal_units.push(0);
        nal_units.push(0);
        nal_units.push(1);

        if nal_length == 0 {
            return Err(AnnexConversionError::NalLenghtParseError);
        }
        let mut nal_unit = vec![0u8; nal_length];
        let bytes_read = data_cursor.read(&mut nal_unit);
        match bytes_read {
            Ok(bytes_read) => {
                nal_units.extend_from_slice(&nal_unit[0..bytes_read]);
                //TODO: this is never called so we don't ever detect EOF
                if bytes_read == 0 {
                    break;
                } else if bytes_read < nal_unit.len() {
                    return Err(AnnexConversionError::NalUnitExtendError);
                }
            }
            Err(e) => return Err(AnnexConversionError::IoError(e)),
        };
    }
    Ok(())
}

I used the hardcoded NALULenghtMinusOne=3 from https://github.com/scottlamb/retina/blob/9c85a2dd80629bbcffc2aad7323a691d574bde04/src/codec/h264.rs#L701, but should be subject to change if you change in the code.

Anyways, this was just to test. I think it would be better if retina had a way to specify AVCC or annexb. Can I do a PR to do this option? I think here: https://github.com/scottlamb/retina/blob/9c85a2dd80629bbcffc2aad7323a691d574bde04/src/codec/h264.rs#L475 it should not write the length and append 0x000001, when annexb format is true. But maybe also support this function for people that want to still receive avcc but get the nalu sometimes? By the way this is not the best implementation because Cursor does not tell me about EOF as you see in the comments

scottlamb commented 2 years ago

Sorry, your questions in this issue fell through the cracks until now.

Anyways, this was just to test. I think it would be better if retina had a way to specify AVCC or annexb. Can I do a PR to do this option?

Yes, I think that's a totally reasonable thing to add.

We should probably choose the depacketizer's output format at setup time. I'd prefer to limit how often we make breaking changes, so we should probably make the current setup use the default value, and setup_extended allow passing an options struct. We could change the signature of setup and remove the redundant function along with other breaking changes in #47 .

But maybe also support this function for people that want to still receive avcc but get the nalu sometimes?

I'm not sure how often someone will want both forms. We can add an AVC -> Annex B conversion function (like the one you wrote above) if there's demand. The reverse would be less efficient since we'd have to scan all the bytes for the start code.

By the way this is not the best implementation because Cursor does not tell me about EOF as you see in the comments

Yeah, I think I'd avoid Cursor in favor of direct slice indexing on the source buffer. This also would allow easily copying directly rather than allocating that nal_unit intermediary and having an extra copy there.

It's also possible to convert an owned buffer in-place: the 4-byte length can be replaced with 00 00 00 01. The extra leading zero is allowed. But currently VideoFrame doesn't give the caller a way to take ownership of the data.