Open tianyuan168326 opened 6 months ago
Thanks for the comments. We are focusing on YUV420 color space as most of traditional video codecs. We find that the same model could also work for RGB content (at least using BT.709 conversion). We did not test other RGB content. Would you please check whether the following two lines using the correct conversion matrix as expected (if not using BT.709 matrix)?
https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-FM/src/utils/test_helper.py#L88C9-L88C37 https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-FM/src/utils/test_helper.py#L123C1-L123C72
Yes, I used the right matrix. My test pipeline is [1. read PNG files into RGB]->[2. convert RGB into YcBcR using function "rgb_to_ycbcr444"(src.transforms.functional)]->[3. compress the YcBcR frames with DCVC-FM model]->[4.convert YcBcR back into RGB using function "ycbcr444_to_rgb"(src.transforms.functional) ].
I surmise that the probable cause of the issue is the amplification of information loss during the transitions from RGB to YCbCr444 and back to RGB in steps 2 and 4, respectively. This loss is likely exacerbated for source frames or datasets that have not undergone conversion using the BT.709 standard.
But, traditional codecs/previous RGB-space neural codecs appear to be more robust against variations in the color space conversion methods used for source frames.
Could you please fine-tune a model for RGB input/output (I mean a model specifically for RGB frames, without YUV conversion) and release this model? I think just a few steps are enough to get this model. But since I don't have a training strategy, I might need to trouble you to go through this fine-tuning process. This will help identify the problem. Thanks! !
As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.
Thanks, I will try that.
As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.
Could you release the traing codes of DCVC-FM,since we want to do the fine tuning in our dataset. Thanks.
hedelong92@163.com
As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.
Could you release the traing codes of DCVC-FM,since we want to do the fine tuning in our dataset. Thanks.
hedelong92@163.com Did you receive the training code? If so, could you send me a copy? Thank you very much! nan_nanzi@qq.com
Thank you for the released codes and models; they have significantly helped my research! However, I have encountered some confusion during the evaluation.
Most previous approaches have adopted PNG datasets extracted using ffmpeg software during the conversion from YUV420P to PNG. I tested both the DCVC-DC and DCVC-FM models on these datasets that were converted with ffmpeg. It was observed that the DCVC-FM model performed significantly worse under the same Group of Pictures (GOP) length of 32 in RGB test conditions, with the exception of the HEVC Class E dataset.
Has anyone else encountered this issue?
I conjecture the reason maybe that neural networks are easily fitted to data processing during training, considering that the training color conversion adheres to the BT.709 standard. However, traditional codecs perform consistently across different color conversion approaches.