pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.15k stars 6.94k forks source link

[docs] PIL image/enhance ; OpenCV; scikit-image ops <> torchvision transforms migration advice / summary table / test+diff images / comments in individual functions #5194

Open vadimkantorov opened 2 years ago

vadimkantorov commented 2 years ago

Original name: [docs] F.gaussian_blur should specify relation of kernel_size / sigma to PIL's sigma

📚 The doc issue

A lot of legacy GaussianBlur augmentations like in https://github.com/facebookresearch/dino/blob/d2f3156/utils.py#L36 and in https://github.com/facebookresearch/unbiased-teacher/blob/main/ubteacher/data/transforms/augmentation_impl.py#L20 have used PIL's GaussianBlur which has only a single radius parameter. New torchvision's GaussianBlur has two parameters: hard kernel_size and soft sigma. It would be very useful if their semantics is explained in relation to existing/popular/legacy PIL's arguments. This will help in porting the legacy code to new torchvision's native GaussianBlur.

For reference, native pillow's implementation: https://github.com/python-pillow/Pillow/blob/95cff6e959bb3c37848158ed2145d49d49806a31/src/libImaging/BoxBlur.c#L286. It also seems that Pillow's implementation is actually a uniform weighted blur, not a true gaussian one

Related question on SO: https://stackoverflow.com/questions/62968174/for-pil-imagefilter-gaussianblur-how-what-kernel-is-used-and-does-the-radius-par which suggests that radius ~= sigma

Related issues:

cc @vfdev-5 @datumbox

vadimkantorov commented 2 years ago

Similar explicit comments would be very useful for migration away from PIL's usages of ImageOps / ImageEnhance found in many codebases (e.g. recent https://github.com/microsoft/SoftTeacher/blob/main/ssod/datasets/pipelines/rand_aug.py). Explicit comments on supported image ranges are also useful (e.g. uint8 [0; 255], int8 [-128; 128], float [0; 1], float [-1, 1], more general zero-centered float)

vadimkantorov commented 2 years ago

Recent example of related PIL <> torchvision transforms confusion: https://github.com/pytorch/vision/issues/5204

vadimkantorov commented 2 years ago

Ideally, there should be tests and image examples of PIL vs torchvision calls of same transforms

vfdev-5 commented 2 years ago

@vadimkantorov researchers have a freedom to use any kind of implementation and coin a name to it. When GaussianBlur was introduced to torchvision (https://github.com/pytorch/vision/issues/2635), the reference was swav project and https://github.com/facebookresearch/swav/blob/d4970c83791c8bd9928ec4efcc25d4fd788c405f/src/multicropdataset.py#L65 where they use opencv, fixed kernel size and sigma. swav historically is earlier than dino and others. I do not know the reason why they switched from one to another implementation (maybe, to kick-out opencv dependency?). Also PIL blur kernel size is limited as far as I remember.

As for https://github.com/pytorch/vision/issues/5194#issuecomment-1015316258 it is not about PIL vs torchvision (as PIL does not provide shear arg, only affine matrix) but how parametrization (rotation, scale, shear, translation -> affine matrix) is done in torchvision vs scikit-image and computer vision/graphics conventions.

vadimkantorov commented 2 years ago

torchvision vs scikit-image

Yeah, I guess this comparison is also relevant...

My point is that there are plenty of code using legacy non-tensor ops implemented in PIL and OpenCV. For building upon this code, modernization of using more modern torchvision impls is considered (as it probably improves perf and sometimes enable differentiability and GPU usage). It would be nice to document differences of torchvision versions from other existing versions in PIL / OpenCV / scikit-image.

coin a name to it.

I think this should not be a reason for inviting everyone to re-duplicate effort of testing different versions of PIL's GaussianBlur vs torchvision's GaussianBlur and OpenCV's GaussianBlur vs torchvision's GaussianBlur.

vfdev-5 commented 2 years ago

It would be nice to document differences of torchvision versions from other existing versions in PIL / OpenCV / scikit-image.

This is very large question :) For instance, resize op with all its options is not implemented identically in PyTorch, TF, Opencv, PIL, Scikit-Image. OpenCV is a reference for PyTorch implementations but OpenCV nearest resize has a bug which is "fixed" with another option (to keep BC). For example, https://gist.github.com/vfdev-5/a5bd5b1477b1c82a87a0f9e25c727664 And imagine doing that for all other ops :)

vadimkantorov commented 2 years ago

Well, that's the problem :) It would be nice to start collecting a battery of test examples and small code snippets evaluating these functions with different options (or at least default options?) with different libraries: publish these images, html with all them and diff images. Especially if the ops are implemented slightly differently, these differences in results should be presented even if the root cause of difference isn't diagnosed / researched yet.

Otherwise, users do this duplicate effort or just abandon it and have some hard-to-detect bugs in all these teacher-student self-supervised models that are relying on very particular hyper-params of augmentations to be set right.

My point is that incomplete, published to docs result of this testing are better than no testing and it being discussed in forums and passed from a PhD student to the next one as dark knowledge :)

vfdev-5 commented 2 years ago

Yeah, implementation details (and bugs) could be as you say the dark knowledge to pass. Like why deeplab models have input of size 513 pixels, https://ppwwyyxx.com/blog/2021/Where-are-Pixels/#Libraries

vadimkantorov commented 2 years ago

Yeah, I'm a big fan of specifying in the docs (also of box_ops) of exactly what the impl is doing and exactly what format of coordinates is expected under big influence of this blog post :)