Open x4Cx58x54 opened 2 years ago
@x4Cx58x54 Thanks a lot for the proposal.
We need a bit more time to decide how we want to handle this. Right now we are in the middle of revamping the Transforms API to offer native support not only for Images but also Videos, Bounding Boxes, Masks, Labels etc. We plan to post soon a blogpost with the announcement but you can see some examples at #6753.
To make the long story short, the new Transforms API "stores" the videos in a [..., T, C, H, W]
format. This allows us to very efficiently transform the video frames by reusing existing image kernels. We also offer transforms to permute/transpose the dimensions. The new API uses Tensor Subclassing to store meta-data along the standard tensor (things like colour space for example).
Offering an extra parameter on normalize
kernel is possible but conflicts with the existing design. Having said that, in some limited cases, we've offered this new parameter to assist user migration. For example:
https://github.com/pytorch/vision/blob/e96860d60be171e0802cdbd180ca976c1afd2b50/torchvision/prototype/transforms/functional/_temporal.py#L6
Given the above, shall we wait for the blogpost to be published (happy to give you a ping) and give you some time to review the design? After that, it would be great to get your input on whether the new API covers your needs or if you think we need enhancements. Let me know what you think. Thanks!
@datumbox Thanks for your reply. I would be greatly obliged if you give me a ping!
š The feature
Specify channel dim for
transforms.Normalize
,transforms.functional.normalize
,transforms.functional_tensor.normalize
, To enabletransforms.Normalize
to normalize according mean and std by specified channel.A solution is adding a new argument
dim_channel
to the classes and functions above andMotivation, pitch
Recent torchvision deprecated
transforms._transforms_video
and added features in many transforms to process [..., H, W] shaped tensors. For video transforming, it is a great improvement, meanwhile,transforms.Normalize
is not lucky enough to be among these transforms. This means that the users either resort to other transforms such aspytorchvideo.transforms.Normalize
or normalize each frame seperately. The requested feature will relieve this pain, and video transforms can be more nice and neat.Alternatives
No response
Additional context
No response
cc @vfdev-5 @datumbox