Specify channel dim for transforms.Normalize

x4Cx58x54 commented 2 years ago

🚀 The feature

Specify channel dim for transforms.Normalize, transforms.functional.normalize, transforms.functional_tensor.normalize, To enable transforms.Normalize to normalize according mean and std by specified channel.

A solution is adding a new argument dim_channel to the classes and functions above and

# in transforms.functional_tensor.normalize
broadcast_ch_shape = [1 for _ in range(tensor.ndim)]
broadcast_ch_shape[dim_channel] = -1
if mean.ndim == 1:
    mean = mean.view(*broadcast_ch_shape)
if std.ndim == 1:
    std = std.view(*broadcast_ch_shape)
return tensor.sub_(mean).div_(std)

Motivation, pitch

Recent torchvision deprecated transforms._transforms_video and added features in many transforms to process [..., H, W] shaped tensors. For video transforming, it is a great improvement, meanwhile, transforms.Normalize is not lucky enough to be among these transforms. This means that the users either resort to other transforms such as pytorchvideo.transforms.Normalize or normalize each frame seperately. The requested feature will relieve this pain, and video transforms can be more nice and neat.

Alternatives

No response

Additional context

No response

cc @vfdev-5 @datumbox

datumbox commented 2 years ago

@x4Cx58x54 Thanks a lot for the proposal.

We need a bit more time to decide how we want to handle this. Right now we are in the middle of revamping the Transforms API to offer native support not only for Images but also Videos, Bounding Boxes, Masks, Labels etc. We plan to post soon a blogpost with the announcement but you can see some examples at #6753.

To make the long story short, the new Transforms API "stores" the videos in a [..., T, C, H, W] format. This allows us to very efficiently transform the video frames by reusing existing image kernels. We also offer transforms to permute/transpose the dimensions. The new API uses Tensor Subclassing to store meta-data along the standard tensor (things like colour space for example).

Offering an extra parameter on normalize kernel is possible but conflicts with the existing design. Having said that, in some limited cases, we've offered this new parameter to assist user migration. For example: https://github.com/pytorch/vision/blob/e96860d60be171e0802cdbd180ca976c1afd2b50/torchvision/prototype/transforms/functional/_temporal.py#L6

Given the above, shall we wait for the blogpost to be published (happy to give you a ping) and give you some time to review the design? After that, it would be great to get your input on whether the new API covers your needs or if you think we need enhancements. Let me know what you think. Thanks!

x4Cx58x54 commented 2 years ago

@datumbox Thanks for your reply. I would be greatly obliged if you give me a ping!

pytorch / vision