kenshohara / video-classification-3d-cnn-pytorch

Video classification tools using 3D ResNet
MIT License
1.11k stars 260 forks source link

The normalization of the input image ? #63

Closed Fly2flies closed 3 years ago

Fly2flies commented 3 years ago

Thanks for sharing the great code base. But I still have a little puzzle: in the get_mean(), the mean is [114.7748, 107.7354, 99.4750]. It looks like this is for images with input ranges from 0 to 255. But in the normalization for input image

spatial_transform = Compose([Scale(opt.sample_size),
                                 CenterCrop(opt.sample_size),
                                 ToTensor(),
                                 Normalize(opt.mean, [1, 1, 1])])

after ToTensor(), the value becomes [0,1]. Is this appropriate for subsequent video classifications? Do we need to swap the order of these two operations ?

R00Kie-Liu commented 3 years ago

Different from the official implementation of ToTensor() in PyTorch, this Totensor() class doesn't normalize the values to [0,1] . It just converts PIL data to torch tensor format.

Fly2flies commented 3 years ago

Different from the official implementation of ToTensor() in PyTorch, this Totensor() class doesn't normalize the values to [0,1] . It just converts PIL data to torch tensor format.

Ok, I got it. Thanks for your reply.