[discussion] [wip] Move/adopt to torchvision frequent utils from detectron2 (and potentially other frameworks)

vadimkantorov commented 1 year ago

🚀 The feature

I propose to open the discussion and collect in this issue some discrepancies or duplicate functionalities I found between detectron2 and torchvision.

detecton2 / mmdet / kornia are now quite old and established, so one can discuss of adopting well-designed / widely-used util functions and Modules from these frameworks to torchvision, so that there's less reimplementation and more established idioms.

move CUDA and C++ ops from detectron2: https://github.com/facebookresearch/detectron2/tree/main/detectron2/layers/csrc: ROIAlignRotated+box_iou_rotated+nms_rotated, deformable convolutions (already moved to torchvision?). Moving them to torchvision would also allow to have a workable python-only version of detectron2
fast evaluators from detectron2: coco format evaluator / lvis evaluator?
paste_masks_in_image - already a version exists in torchvision but with slightly different API; and some other mask_ops?
simple colormap generation
instance / box visualizer: https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/visualizer.py
exif parser / apply exif orientation: https://github.com/facebookresearch/detectron2/blob/cbbc1ce26473cb2a5cc8f58e8ada9ae14cb41052/detectron2/data/detection_utils.py#L119
focal loss from fvcore https://github.com/facebookresearch/fvcore/tree/main/fvcore/nn and other utils from fvcore?

Motivation, pitch

N/A

Alternatives

No response

Additional context

No response

oke-aditya commented 1 year ago

Okay, let's discuss each of these.

[ ] Rotated formats CUDA Ops: -

Rotated operations need additional support as well as Transforms API etc. See #2761 It will also need to be compatible with new Transforms API etc.

[ ] Fast evaluators: - Can you please elaborate how would it benefit and where we could keep in torchvision.
[ ] MasOps: - torchvision already has paste_masks_in_image. What other Mask ops do you have in mind?
[ ] Color Map Visualization: -

torchvision has few colormap utils in torchvision. Please see https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L486 Albeit these are private, as torchvision is not in general a visualization library so we were hesitant to make it public.

[ ] Instance and Box Visualizer: -

Again torchvision has few visualization utils https://github.com/pytorch/vision/blob/main/torchvision/utils.py#L13 Are there any additional methods you suggesting? Panopatic segmentation etc is not supported by torchvision so it might not be great to have visualization util for same. Can you elaborate what you want from detectron here?

[ ] Exif parser: -

I have no idea. cc @vfdev-5 @pmeier

[ ] Focal Loss: -

Torchvision already have focal loss. https://github.com/pytorch/vision/blob/main/torchvision/ops/focal_loss.py

Let's discuss a bit more so we know requirements clearly. :)

pmeier commented 1 year ago

Re exif: I have no knowledge about that, but the function inside detectron two is over 2 years old and links to a now closed Pillow issue. According to the comments in the thread there, the patch landed in Pillow==7.0.0. Although we haven't acted on it yet, we have agreed that we should support Pillow starting from current major minus 1 in https://github.com/pytorch/vision/pull/5898#issuecomment-1111998953. Given that the current version is Pillow==9.4.0, we can limit support to Pillow>=8 and thus can depend on the fix.

vadimkantorov commented 1 year ago

On Can you please elaborate how would it benefit and where we could keep in torchvision.: I think it would make COCO / COCO-format evaluation of detection/instance segmentation faster and reduce the number of dependencies for simple/standard setups for the users. And now COCO-format can be reasonably thought as one of dominant dataset formats for instance/panoptic detection and segmentation. So it would be good to have the metric evaluation (fast, parallelized, tested, RLE mask functions in core) included as well. About where - no idea :) A new namespace torchvision.eval or torchvision.evalutils?

About exif: also, are libjpg/libpng supporting parsing EXIF attributes from image files? or maybe can be done with a very simple python parser? I guess pillow is needed only for reading these attributes here. So without pillow, some sort of EXIF parser is needed. OpenCV also has some information about exif parsing in their changelog: https://github.com/opencv/opencv/wiki/ChangeLog#version452. This may be more related to I/O component.

about focal loss: can we ask fvcore authors if their focal loss matches exactly the focal loss in torchvision? currently detectron2 uses some fvcore stuffs. I guess one could check all functions from fvcore used by detectron2 and check their usefulness for inclusion in core

About paste_masks_in_image: why then small difference in API and why doesn't detectron2 use the torchvision's function if they are identical?

about rotated boxes ops: converging on object-oriented structures can take a long time (i think transforms/prototype v2 is already lasting for 1 year? 2 years?), but having simple purely functional functions (accepting pure tensors with well-described format of docs) should require less of design and can even happen before full support of transforms - this would allow using these ops in other frameworks. other frameworks have their own modeling for "structures" / transforms anyway. but importing those transforms on "ops" level is also a good idea IMO.

In general, my motivation for this discussion opening is that deduplicating common/frequently-used/useful well-contained/small-api idioms from other libraries will help them reduce the API surface, and these idioms would be useful for all downstream libraries and users and reduce the amount of boiler-plate. And that a systematic discussion with those library authors would be fruitful, they would also have interesting things to say which idioms/functions (especially well-contained and small API-surface) turned out to be well-designed and would be good to include in core for reusability

Above I just scratched some utils that might be good to have in core while browsing the current detectron2 codebase, so the list is quite arbitrary. So this is not in stone, I just propose to look at those codebases with a fresh eye and ask whether some utils have become standard enough

vadimkantorov commented 1 year ago

@oke-aditya for other mask ops/utils, I had posted some my ideas in https://github.com/pytorch/vision/issues/4415

pytorch / vision