pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.22k stars 6.95k forks source link

[discussion] [wip] Move/adopt to torchvision frequent utils from detectron2 (and potentially other frameworks) #7070

Open vadimkantorov opened 1 year ago

vadimkantorov commented 1 year ago

🚀 The feature

I propose to open the discussion and collect in this issue some discrepancies or duplicate functionalities I found between detectron2 and torchvision.

detecton2 / mmdet / kornia are now quite old and established, so one can discuss of adopting well-designed / widely-used util functions and Modules from these frameworks to torchvision, so that there's less reimplementation and more established idioms.

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

oke-aditya commented 1 year ago

Okay, let's discuss each of these.

Rotated operations need additional support as well as Transforms API etc. See #2761 It will also need to be compatible with new Transforms API etc.

torchvision has few colormap utils in torchvision. Please see https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L486 Albeit these are private, as torchvision is not in general a visualization library so we were hesitant to make it public.

Again torchvision has few visualization utils https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L13 Are there any additional methods you suggesting? Panopatic segmentation etc is not supported by torchvision so it might not be great to have visualization util for same. Can you elaborate what you want from detectron here?

I have no idea. cc @vfdev-5 @pmeier

Torchvision already have focal loss. https://github.com/pytorch/vision/blob/main/torchvision/ops/focal_loss.py

Let's discuss a bit more so we know requirements clearly. :)

pmeier commented 1 year ago

Re exif: I have no knowledge about that, but the function inside detectron two is over 2 years old and links to a now closed Pillow issue. According to the comments in the thread there, the patch landed in Pillow==7.0.0. Although we haven't acted on it yet, we have agreed that we should support Pillow starting from current major minus 1 in https://github.com/pytorch/vision/pull/5898#issuecomment-1111998953. Given that the current version is Pillow==9.4.0, we can limit support to Pillow>=8 and thus can depend on the fix.

vadimkantorov commented 1 year ago

On Can you please elaborate how would it benefit and where we could keep in torchvision.: I think it would make COCO / COCO-format evaluation of detection/instance segmentation faster and reduce the number of dependencies for simple/standard setups for the users. And now COCO-format can be reasonably thought as one of dominant dataset formats for instance/panoptic detection and segmentation. So it would be good to have the metric evaluation (fast, parallelized, tested, RLE mask functions in core) included as well. About where - no idea :) A new namespace torchvision.eval or torchvision.evalutils?

About exif: also, are libjpg/libpng supporting parsing EXIF attributes from image files? or maybe can be done with a very simple python parser? I guess pillow is needed only for reading these attributes here. So without pillow, some sort of EXIF parser is needed. OpenCV also has some information about exif parsing in their changelog: https://github.com/opencv/opencv/wiki/ChangeLog#version452. This may be more related to I/O component.

about focal loss: can we ask fvcore authors if their focal loss matches exactly the focal loss in torchvision? currently detectron2 uses some fvcore stuffs. I guess one could check all functions from fvcore used by detectron2 and check their usefulness for inclusion in core

About paste_masks_in_image: why then small difference in API and why doesn't detectron2 use the torchvision's function if they are identical?

about rotated boxes ops: converging on object-oriented structures can take a long time (i think transforms/prototype v2 is already lasting for 1 year? 2 years?), but having simple purely functional functions (accepting pure tensors with well-described format of docs) should require less of design and can even happen before full support of transforms - this would allow using these ops in other frameworks. other frameworks have their own modeling for "structures" / transforms anyway. but importing those transforms on "ops" level is also a good idea IMO.

In general, my motivation for this discussion opening is that deduplicating common/frequently-used/useful well-contained/small-api idioms from other libraries will help them reduce the API surface, and these idioms would be useful for all downstream libraries and users and reduce the amount of boiler-plate. And that a systematic discussion with those library authors would be fruitful, they would also have interesting things to say which idioms/functions (especially well-contained and small API-surface) turned out to be well-designed and would be good to include in core for reusability

Above I just scratched some utils that might be good to have in core while browsing the current detectron2 codebase, so the list is quite arbitrary. So this is not in stone, I just propose to look at those codebases with a fresh eye and ask whether some utils have become standard enough

vadimkantorov commented 1 year ago

@oke-aditya for other mask ops/utils, I had posted some my ideas in https://github.com/pytorch/vision/issues/4415