pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16k stars 6.92k forks source link

[RFC] TorchVision with Batteries included - Phase 1 #3911

Closed datumbox closed 2 years ago

datumbox commented 3 years ago

🚀 Feature

Note: To track the progress of the project check out this board.

Add popular primitives (Losses, Schedulers, Data Augmentations, Operators etc) which are often used to reproduce SOTA references and new popular highly accurate models with pre-trained weights to TorchVision.

Motivation

Though TorchVision currently includes many common building blocks necessary for training CV models, it currently lacks popular primitives which are often used to reproduce SOTA. Some of these primitives are part of our reference scripts (Data utils, transforms etc) because previously did not want to commit to a specific API. Others are part of libraries from the broader ecosystem. Additionally, it does not provide some of the newer, popular architectures which currently achieve good results in a variety of vision tasks.

Adding support of such primitives and models to TorchVision will give a “batteries included” experience to its users. Researchers will be able to do SOTA research and reproduce papers by using common building blocks rather than rewriting their own while industry users will be able to adapt easier the models in their domains using SOTA techniques.

Pitch

The addition of primitives should be done in several phases, iterating between trying to reproduce SOTA recipes, identifying accuracy gaps and implementing the necessary methods to close them. The progress of this project is tracked on this board.

During phase 1, add to TorchVision the following primitives and models:

Other potential primitives to be considered during phase 2:

Note that any of the suggested primitives that are not vision-specific should be added on PyTorch, so that all Domain libraries can benefit from them.

cc @vfdev-5 @fmassa @oke-aditya @jbschlosser @iramazanli

oke-aditya commented 3 years ago

Adding on little bit. There have been such approaches to create batteries loaded libraries on top of torchvision. We might as well take inspirations, motivations and add features that can be useful.

Edit Rewrote the ideas after giving more thoughts.

Requests have come for 3D NMS, 3D ops, etc #2402 .

Say we could easily do nn.MLP nn.SqeezeExcite nn.TwoMLPHead or nn.InvertedResidual.nn.BasicBlock (see #4333). These would help people in create models more easily than copy pasting torchvision files.

bmanga commented 3 years ago

+1 from me. Maybe there should be an experimental namespace where we can refine the API over time before promoting features to stable.

oke-aditya commented 3 years ago

Any thoughts about adding different types of IoU metrics such as DIoU (Distance IoU) and CIoU (Complete IoU) Refer paper

The above paper mentions some benefits in training with FRCNN, SSD and YOLOv4. These are now used by YOLOv5

This was earlier asked in #3026 #2545

Probably these two operations are more mature now to be included?

datumbox commented 3 years ago

@oke-aditya I just realized I haven't responded. Apologies for that.

I agree we should explore the ideas you added. I know you have separate issues for all of them, so let's track them there. Happy to move them to Batteries Included once the first set of primitives is added. Some additional discussion will be necessary to prioritize but we'll do this when it's time to review additions.

oke-aditya commented 3 years ago

Hey, now worries @datumbox :smile: You are doing an awesome work, and batteries included is a great initiative :+1:

Let me know if there are any features that I could work on. Maybe ones that are not critical or timeline specific so that it won't hinder overall development.

datumbox commented 3 years ago

@oke-aditya I'm currently collecting feedback from various research teams at FB to see which other operators are worth including. I will definitely let you know once things become clear; I just want to make sure you won't work on something that we later feel shouldn't be added. Any way if you want to have an early look, check out the DropBlock paper.

vaibhava0 commented 3 years ago

I recommend adding Large Scale Jitter data augmentation as well. It is quite simple and powerful. It has shown promise across many use cases. Here is an implementation in D2:

https://github.com/facebookresearch/detectron2/blob/main/configs/new_baselines/mask_rcnn_R_50_FPN_100ep_LSJ.py#L44

Some benchmark results: https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#new-baselines-using-large-scale-jitter-and-longer-training-schedule

0x00b1 commented 3 years ago

Oof! I feel bad! This is shaping up to be a great release.

datumbox commented 3 years ago

@vaibhava0 Thanks for the proposal. You are right, this is quite important. We have it on the https://github.com/pytorch/vision/issues/3817 but I'll add it here more prominently (I also added the code and the benchmarks links you provided).

@0x00b1 Your contribution can still make it in. tap tap tap ⌨️ 🚀

oke-aditya commented 2 years ago

Any thoughts about adding different types of IoU metrics such as DIoU (Distance IoU) and CIoU (Complete IoU) Refer paper

CIoU and DIoU have been added to Detectron2

https://github.com/facebookresearch/detectron2/blob/dfe8d368c8b7cc2be42c5c3faf9bdcc3c08257b1/detectron2/layers/losses.py#L66

datumbox commented 2 years ago

@oke-aditya We are wrapping phase 1 of this project. There will be a phase 2 in Q1 and we can definitely reassess what else needs to be added. It's great that you keep the linked Issues up-to-date with references and proposals so that we can keep up with the proposals there.

datumbox commented 2 years ago

Batteries Included - Phase 1 is now concluded! I believe we have successfully refreshed TorchVision library to support the Classification use-case and we have managed to refresh all of the popular pre-trained models of the library.

I'm going to close this ticket and start scoping Phase 2, which will focus on the Detection and Segmentation use-cases. Massive thanks to all of the people who contributed to this project either by implementing primitives, adding models or training new weights. I'll follow up with a new RFC for phase 2 and start outlining next steps so that we can get the feedback from the community.

Note: Some of the primitives that didn't make the cut in this RFC, will move on the next phase.

xiaoyuan0203 commented 2 years ago

Will Mixup and CutMix be added to torchvision.transform and when? #4379

datumbox commented 2 years ago

@xiaoyuan0203 that's the plan. We have untested implementations on prototype. We are working to finalize the API, document them and start testing them. I don't recommend them yet for production use-cases, but we will make sure to post an update when our confidence in them is increased.