[RFC] Batteries Included - Phase 2

datumbox commented 2 years ago

🚀 The feature

Note: To track the progress of the project check out this board.

This is the 2nd phase of TorchVision's modernization project (see phase 1). We aim to keep TorchVision relevant by ensuring it provides off-the-shelf all the necessary primitives, model architectures and recipe utilities to produce SOTA results for the supported Computer Vision tasks.

1. New Primitives

To enable our users to reproduce the latest state-of-the-art research we will enhance TorchVision with the following data augmentations, layers, losses and other operators:

Data Augmentations

[x] Augmix - #5411
[x] Large Scale Jitter - #5435 #5446 #5559
[x] Fixed Size Crop - #5607
[x] Random Shortest Size - #5610
[x] Simple CopyPaste - #5825

Layers

[x] DropBlock - #5416
[x] Conv3DNormActivation - #5445
[x] MLP - #6053
[x] Permute - #6055

Losses

[x] Generalized-IoU loss - #4961
[x] Distance-IoU & Complete-IoU loss - #5776 #5786 #5984

Operators added in PyTorch Core

[x] Better EMA support in AveragedModel - https://github.com/pytorch/pytorch/pull/71763
[x] Add support of empty output in SyncBatchNorm - https://github.com/pytorch/pytorch/pull/74944

2. New Architectures & Model Iterations

To ensure that our users have access to the most popular SOTA models, we will add the following architectures along with pre-trained weights. Moreover we will improve existing architectures with commonly adopted optimizations introduced in follow up research:

Image Classification

[x] ConvNeXt - #5197 #5253 #5330
[x] EfficientNetV2 code - #5450
[x] Swin Transformer - #5491 #6048

Object Detection & Segmentation

[x] FCOS #4961
[x] Post-paper optimizations for RetinaNet, FasterRCNN & MaskRCNN #5444

Video Classification

[x] MViT - #6198

3. Improved Training Recipes & Pre-trained models

To ensure that are users can have access to strong baselines and SOTA weights, we will improve our training recipes to incorporate the newly released primitives and offer improved pre-trained models:

Reference Scripts

[x] Update EMA to use PyTorch Core's new implementation - #5469
[x] Add support of new Detection primitives in Reference Scripts - #5715

Pre-trained weights

[x] Improve the accuracy of Classification models - #5560 #5906 #5935 #6019
[x] Close the gap with SOTA for Object Detection & Segmentation models - #5756 #5763 #5773
[x] Add weakly-supervised weights for ViT and RegNets - #5714 #5722 #5732 #5721 #5793

Other Candidates

There are several other Operators (#5414), Losses (#2980), Augmentations (#3817) and Models (#2707) proposed by the community. Here are some potential candidates that we could implement depending on bandwidth. Contributions are welcome for any of the below:

AutoAugment Detection code - #6224
Deformable DeTR
Polynomial LR scheduler (upstream to Core)
Shortcut Regularizer (FX-based)

cc @datumbox @vfdev-5

xiaohu2015 commented 2 years ago

@datumbox I think Swin Transformer is a very popular model, so I am planing to add it to torchvsion.

datumbox commented 2 years ago

Sounds great @xiaohu2015, thanks for the help!

Can you open an "empty" PR similar to what you did for Dropblock initiatilly? It will help us mark the item as in-progress and avoid others trying to do the same.

lezwon commented 2 years ago

Hey @datumbox, I'd like to take a shot at Simple CopyPaste augmentation, if it's available. Although I would definitely require some initial guidance on it :)

datumbox commented 2 years ago

@lezwon Yes it's available and very high on our candidate list. :) Note that the API of this transform is tricky because it combines transforms across images in the batch (similar to MixUp and CutMix located at Classification references, not the ones on prototype).

How about the following? If you write a functional implementation I can help you review, adapt it to the necessary API and test it on real models/data. Let me know your thoughts!

PS: Note that I am currently OOO until Tuesday, so I might be slow to respond until then.

lezwon commented 2 years ago

@datumbox sounds good 👍 I'll get started on it and ping you once i have a POC ready.

pytorch / vision