NV12/YUV->RGB colour accuracy and CUDA

gtebbutt commented 4 weeks ago

I've noticed some odd colour space conversion issues when using the yuv_to_rgb function in the otherwise very helpful NVDEC tutorial - it seems to be subtly but visibly shifting colours and/or clipping the dynamic range, but I'm not certain why. Originally thought there might be issues between BT.601/BT.709/BT.2020 content, but trying other python functions using those matricies didn't seem to help; it could definitely be my error somewhere, but I wasn't able to get correct colour output on anything that'd been through the implicit NV12->YUV444 conversion step.

Since there's been some discussion on moving the colour space conversion to CUDA anyway, I wanted to flag this implementation in case it's helpful. We ended up seeing a significant speed increase using that rather than applying conversions in tensor format, with all colours coming back exactly as expected.

cc https://github.com/dmlc/decord/issues/283#issuecomment-2151922632

vadimkantorov commented 2 weeks ago

The idea of adding tested/fast colorspace conversions was not supported at the time in torchvision: https://github.com/pytorch/vision/issues/4029

But maybe then torchaudio could host such functions

gtebbutt commented 2 weeks ago

Interesting - colour space conversion is mentioned on the torchaudio to-do list at https://github.com/pytorch/audio/issues/3139, so there does seem to be some intention to add it, at least. There's also the existing NV12->YUV444 conversion built into StreamReader, so perhaps priorities have changed a bit now that the decoding is being pulled out into the torio namespace?

vadimkantorov commented 2 weeks ago

Maybe even then some of these could be upstreamed later into torch core to be available across the board... Regarding torio, really hoping some torch-prefixed name can be invented before it's too late and we are stuck with both torch-prefixed-domain-library-zoo and tor-prefixed-domain-library-zoo :)

pytorch / audio

NV12/YUV->RGB colour accuracy and CUDA #3799