pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.06k stars 6.93k forks source link

Are new models planned to be added? #2707

Open talcs opened 4 years ago

talcs commented 4 years ago

πŸš€ Feature

Adding new models to the models section.

Motivation

Many new models have been proposed in the recent years and do not exist in the models module. For example, the EfficientNets appear to provide with 8 models of different complexities that outperform everything else that exists at each complexity level.

Pitch

See Contributing to Torchvision - Models for guidance on adding new models.

Add pre-trained weights for the following variants:

oke-aditya commented 4 years ago

This request has come often. Just linking all those for reference.

archived - update issue instead - [x] RetinaNet #1697 - [x] Mobile Net v3 #3252 - [x] Mobile Net Backbones #2700. - [x] Mobile Net Backbones for Detection #1999 - [x] Mobile Net Backbones for Segmenetation #3276 - [x] Single Shot Multi-Box (SSD) Detector #3760 #3403 - [x] SSD Lite #3757 - [ ] DeepLabv3+ With Resnet #2689 . Also, this had a discussion about Xception. - [ ] Pretrained weights for ShuffleNetv2 1.5x and 2.0x depth. #3257 - [ ] MNasnet weights for 0_75 and 1_3 models. #3722 - [ ] RegNet #2655. - [ ] SE-ResNet and SE-ResNeXt #2179 - [ ] ResNest with ResNest FPN option for object detection. - [ ] Resnext101_64x4d depth. #3485 - [ ] Resnext152_32x4d depth. #3485 - [ ] Efficient Net (b0 to b7) #980 (Perhaps v2 models?) - [ ] EfficientDet - [ ] ReXNet - [ ] DeiT - [ ] DeTR - [ ] Inception-ResNet #3899

Edit by @datumbox: I shamelessly edited your comment and moved your fantastic up-to-date list on the issue for greater visibility.

Reply by @oke-aditya: I was actually going to suggest to do the same :smiley:

A generalized guideline for adding models is being added in contributing.md file in this pr #2663.

fmassa commented 4 years ago

Hi,

To complement @oke-aditya great answer, we will be adding more models to torchvision, including Efficient Nets and MobileNetV3.

The current limitation is that we would like to ensure that we can reproduce the pretrained model using the training scripts from references/classification, but those models require a different training recipe than then one present in [references/classification`](https://github.com/pytorch/vision/tree/master/references/classification), so we will need to update those recipes before uploading those new models.

songyuc commented 3 years ago

I hope to add Mish activation function.

digantamisra98 commented 3 years ago

@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in the future. Link to the issue - https://github.com/pytorch/pytorch/issues/25584

WZMIAOMIAO commented 3 years ago

first, thanks for your great works. I hope to add Swish activation and NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171. In addition, I would like to ask when eficientnet can be added. I found that it was mentioned in 2019, but now it's 2021. I refer to the mobilenetV3 model in torchvision, then I built efficientnet models Test9_efficientNet, but I don't have a GPU to train with.

oke-aditya commented 3 years ago

Hi @WZMIAOMIAO Swish Activation function is added in to PyTorch (not torchvision) as nn.Silu. Mobilenetv3 would be hopefully available in next release.

WZMIAOMIAO commented 3 years ago

@oke-aditya Thank you for your reply. I've seen MobileNetv3 in the torchvision repository. When will EfficientNet, RegNet and NFNet be added?

stanwinata commented 3 years ago

Hey guys, I was wondering if the pytorch team are open for public contributions to these models? πŸ€” I assume we can follow similar PR formats to the one here and here along with validation/proof that we can reproduce paper results.

datumbox commented 3 years ago

@stwinata Thanks for offering. Which models do you have in mind to contribute?

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

stanwinata commented 3 years ago

@stwinata Thanks for offering. Which models do you have in mind to contribute?

@datumbox thanks for the quick reply! I am interested in DETR or EfficientDet. I was thinking for first commit maybe DETR might be easier, since we can use DETR's original repo for referene and may be able try to load weights for preliminary validations.

The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)

Perhaps we can also try to determine a canonical pipeline for model contribution through this experience and document it S.T others can contribute in the future easily πŸ˜ƒ !

stanwinata commented 3 years ago

(mainly due to the training bit)

@datumbox Does this come down to lack of GPU resources? Or is it due to the need to validate that it can properly train?

datumbox commented 3 years ago

@stwinata DETR sounds a good addition to me. Since @fmassa is one of the main authors, I will let him have the final say on this.

Contributing models is tricky because:

  1. To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
  2. On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.
  3. The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

Happy to discuss more and see if it's worth doing this now.

stanwinata commented 3 years ago

@datumbox These comments makes sense πŸ˜ƒ

  1. To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
  2. On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.

Yeah I agree, some might even say getting it to models to be "useful" aka reproducing the Paper results are the fun bits πŸ˜ƒ I think in the future model contributions/PRs should include:

I think this way, we can ease the load on Pytorch/Vision maintainers, make PRs much more concrete and useful.

Perhaps we can also have a simple util script that tests trained candidate implementations on various benchmarks.(this might be another feature request πŸ˜„ )

  1. The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.

I also agree with this. Moreover, I think these days GPU-resources either at home, or thru AWS and GCP are getting ubiquitous enough for contributors to do training by themselves πŸ˜ƒ

datumbox commented 3 years ago

@stwinata Thanks for the comments. I think we agree. Below I write few thoughts on the potential process we could adopt.

The minimum to merge such a contribution is:

  1. The PR must include the code implementation, have documentation and tests.
  2. It should also extend the existing reference scripts used to train the model.
  3. The weights need to reproduce closely the results of the paper in terms of accuracy.
  4. The PR description should include commands/configuration used to train the model, so that we can easily run them on our infra to verify.

Note that there are details here related to the code quality etc, but these are rules that apply in all PRs.

For someone who would be interested in adding a model, here are a few important considerations:

  1. Training big models requires lots of resources and the cost quickly adds up.
  2. Reproducing models is fun but also risky as you might not always get the results reported on the paper. It might require a huge amount of effort to close the gap.
  3. The contribution might not get merged if we significantly lack in terms of accuracy, speed etc.

The above are a very big ask I think. But if an OSS contributor is willing to give it a try despite the above adversities, then we would be happy to pair up and help. This should happen in a coordinated way to:

  1. Ensure that the model in question is of interest and that nobody else is already working on adding it.
  2. Ensure there is an assigned maintainer providing support, guidance and regular feedback.

@fmassa let me know your thoughts on this as well.

xiaohu2015 commented 2 years ago

I am aming at adding FCOS to torchvision. https://github.com/xiaohu2015/vision/blob/main/torchvision/models/detection/fcos.py

datumbox commented 2 years ago

@xiaohu2015 Nice work, have you managed to reproduce the accuracies of the paper?

@fmassa Any thoughts on FCOS?

xiaohu2015 commented 2 years ago

@datumbox Yes, I am working to implement it and reproduce the peformance. But I think that some time is need.

fmassa commented 2 years ago

I think FCOS would be a good addition. It is one of the top methods in https://paperswithcode.com/methods/category/object-detection-models that we don't yet have in torchvision, and @xiaohu2015 implementation seems very nice.

xiaohu2015 commented 2 years ago

@datumbox @fmassa Hi, we have pulled the FCOS code (https://github.com/pytorch/vision/pull/4961), Could you review it and give some advice?

aisosalo commented 2 years ago

Would it be possible to have also grey-scale ImageNet weights for the usual models along the lines described in Xie & Richmond?

Xie, Y. and Richmond, D., β€œPre-training on grayscale ImageNet improves medical image classification,” in [Proceedings of the European Conference on Computer Vision (ECCV) Workshops], 476–484, Springer (September 2019).

xiaohu2015 commented 2 years ago

Another CNN: https://github.com/facebookresearch/deit/blob/main/patchconvnet_models.py

jdsgomes commented 2 years ago

In part informed by the discussions in this ticket I am proposing a new model contribution guidelines here. Your feedback/suggestions would be very valuable.

Rusteam commented 2 years ago

Hi there,

I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to torchvision.models? I'm willing to submit a PR with some guidance.

xiaohu2015 commented 2 years ago

Hi there,

I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to torchvision.models? I'm willing to submit a PR with some guidance.

If you only want to add the pretrained weights, I think it is very easy, as torchvision support mult-weights.

datumbox commented 2 years ago

@Rusteam Thanks for the proposal.

Our Model contribution guidelines are still a work in progress. One of the things we need to figure out is how do we deal with contributions that are produced without our reference scripts. Right now, we require all of the weights to be reproducible with our references. There is one exception to the above rule and this is when we port weights directly from a paper. Usually this is not the preferable solution either and we typically do it when we want to offer the architecture but we don't have the time to train the network from scratch (or it's too costly to do so).

Given that your proposal doesn't fall on the above exception, we would have to be able to reproduce our weight training with our scripts. Unfortunately that's going to be tricky because TorchVision doesn't support a training script for few-shot learning. We have a similarity reference script which hasn't really been maintained much. It's within our plans to improve support on the future but currently you wouldn't be able to train the proposed models using our scripts.

cc @yoshitomo-matsubara because there were some discussions of adding better support of distillation in TorchVision.

Rusteam commented 2 years ago

well yeah i guess you're right. Before you have some kind of contrib section, I can use hub to use that model.

oke-aditya commented 2 years ago

Not exactly contrib section. But you could create a repo and a hubconf.py file. So that the model is accessible by torch.hub

talregev commented 2 years ago

Please pin this issue.

talregev commented 2 years ago

@datumbox

xiaohu2015 commented 2 years ago

I want to add another object detection model: ATSS.

datumbox commented 2 years ago

@xiaohu2015 You are on fire this quarter! πŸ”₯ πŸš€

Just to confirm you refer to ATSS which builds upon FCOS? If yes, We've used some of the tricks of this paper on #5444 to improve RetinaNet. Let's chat offline to discuss the details (just want to make sure we have the bandwidth to support on the reviews and avoid lengthy waits). Out of curiosity, originally we wanted to add DETR as part of #5410 so that we can add at least 1 Transformer based model for Detection. Would you be open focusing on this instead?

xiaohu2015 commented 2 years ago

@datumbox Ok, I think transformer-based object detection model will be the trend. so I decide to work on DETR firstly,I will open a new PR.

oke-aditya commented 2 years ago

Hey @xiaohu2015 can I be bit of help? I have had worked on DeTR a bit before and it would be great learning for me to implement a model to torchvision. Will appreciate much if we could collaborate. Thanks :smiley:

xiaohu2015 commented 2 years ago

@oke-aditya OK, but I think I should split the implementation into different parts, I will update this in the PR https://github.com/pytorch/vision/pull/5922

santhoshnumberone commented 2 years ago

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

DINO-4scale with 24FPS with highest Average Precision in just 12 epochs

Screenshot 2022-05-11 at 7 11 49 PM

DINO-4scale with highest Average Precision in just 24 epochs

Screenshot 2022-05-11 at 7 14 32 PM

Lets have DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection ASAP

datumbox commented 2 years ago

@santhoshnumberone Thanks for the proposal. I've added it in the list of potential models.

oke-aditya commented 2 years ago

Hi! We are currently working to add DETR. See #5922 While Dino is SSL model and this is a task torchvision currently does not support. Not sure about future. You can have a look at VisSSL

santhoshnumberone commented 2 years ago

Hi! We are currently working to add DETR. See #5922 While Dino is SSL model and this is a task torchvision currently does not support. Not sure about future. You can have a look at VisSSL

Are you sure? Looking at paperswithcode website DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Published paper Authors are Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum

facebookresearch/dino Emerging Properties in Self-Supervised Vision Transformers authors proposing DINO Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, Armand Joulin saying

We implement our findings into a simple self-supervised method, 
called DINO, which we interpret as 
a form of self-distillation with no labels

Different authors with different papers, I feel both are different. Could you guys go through it once

PS: Looking at the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection github repo one of the author say Code is under preparation, please be patient.

xiaohu2015 commented 2 years ago

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

santhoshnumberone commented 2 years ago

@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.

What I meant was, if anyone of you could check if facebookresearch/dino and the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection are not the same.

I feel both are different

oke-aditya commented 2 years ago

There are many variants of DETR. E.g. deformable DETR, modulated DETR, A simple search would give these results.

https://github.com/search?q=DETR

Let's start of by including vanilla DETR :)

zhiqwang commented 2 years ago

Hi , MobileOne introduced by Apple is interesting, the mobile-vision team implement it at https://github.com/facebookresearch/mobile-vision/pull/91 , is there any plan to support it there?

datumbox commented 2 years ago

@zhiqwang MobileOne wan't in our shotlist but we can certainly keep an eye on it if it builds momentum.

abhi-glitchhg commented 2 years ago

MobileViT is another lightweight vision transformer-based model. The code is available here . Might be good to keep an eye on this one too.

oke-aditya commented 2 years ago

Any small list for Semantic Segmentation models?

Maybe a tentative one

I can try U2Net. Maybe it's an easy Model

talregev commented 2 years ago

Please add BiFPN

https://paperswithcode.com/method/bifpn#:~:text=A%20BiFPN%2C%20or%20Weighted%20Bi,fast%20multi%2Dscale%20feature%20fusion.

oke-aditya commented 2 years ago

Yesss BiFPN is very popular. Maybe once we initiate EfficientDet it would get added.

talregev commented 2 years ago

Will you try to add EfficientDet?

oke-aditya commented 2 years ago

I'm pretty noob when it comes to implementing models. But maybe I will give it a shot after I add a few easy models.

talregev commented 2 years ago

@datumbox Please Pin this issue.