Open gtebbutt opened 4 weeks ago
The idea of adding tested/fast colorspace conversions was not supported at the time in torchvision: https://github.com/pytorch/vision/issues/4029
But maybe then torchaudio could host such functions
Interesting - colour space conversion is mentioned on the torchaudio
to-do list at https://github.com/pytorch/audio/issues/3139, so there does seem to be some intention to add it, at least. There's also the existing NV12->YUV444 conversion built into StreamReader
, so perhaps priorities have changed a bit now that the decoding is being pulled out into the torio
namespace?
Maybe even then some of these could be upstreamed later into torch core to be available across the board... Regarding torio
, really hoping some torch-prefixed name can be invented before it's too late and we are stuck with both torch-prefixed-domain-library-zoo and tor-prefixed-domain-library-zoo :)
I've noticed some odd colour space conversion issues when using the
yuv_to_rgb
function in the otherwise very helpful NVDEC tutorial - it seems to be subtly but visibly shifting colours and/or clipping the dynamic range, but I'm not certain why. Originally thought there might be issues between BT.601/BT.709/BT.2020 content, but trying other python functions using those matricies didn't seem to help; it could definitely be my error somewhere, but I wasn't able to get correct colour output on anything that'd been through the implicit NV12->YUV444 conversion step.Since there's been some discussion on moving the colour space conversion to CUDA anyway, I wanted to flag this implementation in case it's helpful. We ended up seeing a significant speed increase using that rather than applying conversions in tensor format, with all colours coming back exactly as expected.
cc https://github.com/dmlc/decord/issues/283#issuecomment-2151922632