Open talcs opened 4 years ago
This request has come often. Just linking all those for reference.
Edit by @datumbox: I shamelessly edited your comment and moved your fantastic up-to-date list on the issue for greater visibility.
Reply by @oke-aditya: I was actually going to suggest to do the same :smiley:
A generalized guideline for adding models is being added in contributing.md
file in this pr #2663.
Hi,
To complement @oke-aditya great answer, we will be adding more models to torchvision, including Efficient Nets and MobileNetV3.
The current limitation is that we would like to ensure that we can reproduce the pretrained model using the training scripts from references/classification, but those models require a different training recipe than then one present in [
references/classification`](https://github.com/pytorch/vision/tree/master/references/classification), so we will need to update those recipes before uploading those new models.
I hope to add Mish activation function.
@songyuc There is a closed feature request on PyTorch for adding Mish. You can comment over there for increased visibility so that Mish can be considered to be added in the future. Link to the issue - https://github.com/pytorch/pytorch/issues/25584
first, thanks for your great works. I hope to add Swish activation and NFNets(High-Performance Large-Scale Image Recognition Without Normalization) https://arxiv.org/abs/2102.06171. In addition, I would like to ask when eficientnet can be added. I found that it was mentioned in 2019, but now it's 2021. I refer to the mobilenetV3 model in torchvision, then I built efficientnet models Test9_efficientNet, but I don't have a GPU to train with.
Hi @WZMIAOMIAO Swish Activation function is added in to PyTorch (not torchvision) as nn.Silu
.
Mobilenetv3 would be hopefully available in next release.
@oke-aditya Thank you for your reply. I've seen MobileNetv3 in the torchvision repository. When will EfficientNet, RegNet and NFNet be added?
@stwinata Thanks for offering. Which models do you have in mind to contribute?
The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)
@stwinata Thanks for offering. Which models do you have in mind to contribute?
@datumbox thanks for the quick reply! I am interested in DETR or EfficientDet. I was thinking for first commit maybe DETR might be easier, since we can use DETR's original repo for referene and may be able try to load weights for preliminary validations.
The process of model contribution was a bit problematic (mainly due to the training bit) and we still haven't figure out all details. But depending on the proposal, we might be able to work something out. :)
Perhaps we can also try to determine a canonical pipeline for model contribution through this experience and document it S.T others can contribute in the future easily π !
(mainly due to the training bit)
@datumbox Does this come down to lack of GPU resources? Or is it due to the need to validate that it can properly train?
@stwinata DETR sounds a good addition to me. Since @fmassa is one of the main authors, I will let him have the final say on this.
Contributing models is tricky because:
Happy to discuss more and see if it's worth doing this now.
@datumbox These comments makes sense π
- To reproduce the paper it's an iterative process of code + recipe + training. Getting a PR that just adds the implementation is less useful because someone from the maintainers needs to do the heavy lifting of the model reproduction which is the time consuming bit. This is why in the past we avoided accepting contributions on this space.
- On the other hand if someone actually sends a PR that reproduces the paper and has weights, then the only thing for the maintainers is to confirm the accuracy of the pre-trained weights and retrain the model using the reference script to ensure our recipe works as expected.
Yeah I agree, some might even say getting it to models to be "useful" aka reproducing the Paper results are the fun bits π I think in the future model contributions/PRs should include:
I think this way, we can ease the load on Pytorch/Vision maintainers, make PRs much more concrete and useful.
Perhaps we can also have a simple util script that tests trained candidate implementations on various benchmarks.(this might be another feature request π )
- The GPU resources is not a concern for us but for the contributor. We can train models but this is not an infra available for open-source maintainers.
I also agree with this. Moreover, I think these days GPU-resources either at home, or thru AWS and GCP are getting ubiquitous enough for contributors to do training by themselves π
@stwinata Thanks for the comments. I think we agree. Below I write few thoughts on the potential process we could adopt.
The minimum to merge such a contribution is:
Note that there are details here related to the code quality etc, but these are rules that apply in all PRs.
For someone who would be interested in adding a model, here are a few important considerations:
The above are a very big ask I think. But if an OSS contributor is willing to give it a try despite the above adversities, then we would be happy to pair up and help. This should happen in a coordinated way to:
@fmassa let me know your thoughts on this as well.
I am aming at adding FCOS to torchvision. https://github.com/xiaohu2015/vision/blob/main/torchvision/models/detection/fcos.py
@xiaohu2015 Nice work, have you managed to reproduce the accuracies of the paper?
@fmassa Any thoughts on FCOS?
@datumbox Yes, I am working to implement it and reproduce the peformance. But I think that some time is need.
I think FCOS would be a good addition. It is one of the top methods in https://paperswithcode.com/methods/category/object-detection-models that we don't yet have in torchvision, and @xiaohu2015 implementation seems very nice.
@datumbox @fmassa Hi, we have pulled the FCOS code (https://github.com/pytorch/vision/pull/4961), Could you review it and give some advice?
Would it be possible to have also grey-scale ImageNet weights for the usual models along the lines described in Xie & Richmond?
Xie, Y. and Richmond, D., βPre-training on grayscale ImageNet improves medical image classification,β in [Proceedings of the European Conference on Computer Vision (ECCV) Workshops], 476β484, Springer (September 2019).
In part informed by the discussions in this ticket I am proposing a new model contribution guidelines here. Your feedback/suggestions would be very valuable.
Hi there,
I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to torchvision.models
?
I'm willing to submit a PR with some guidance.
Hi there,
I'm doing few-shot classification and similarity learning, and currently dino deit backbone is a top-performing one on my datasets. Can we add it to
torchvision.models
? I'm willing to submit a PR with some guidance.
If you only want to add the pretrained weights, I think it is very easy, as torchvision support mult-weights.
@Rusteam Thanks for the proposal.
Our Model contribution guidelines are still a work in progress. One of the things we need to figure out is how do we deal with contributions that are produced without our reference scripts. Right now, we require all of the weights to be reproducible with our references. There is one exception to the above rule and this is when we port weights directly from a paper. Usually this is not the preferable solution either and we typically do it when we want to offer the architecture but we don't have the time to train the network from scratch (or it's too costly to do so).
Given that your proposal doesn't fall on the above exception, we would have to be able to reproduce our weight training with our scripts. Unfortunately that's going to be tricky because TorchVision doesn't support a training script for few-shot learning. We have a similarity reference script which hasn't really been maintained much. It's within our plans to improve support on the future but currently you wouldn't be able to train the proposed models using our scripts.
cc @yoshitomo-matsubara because there were some discussions of adding better support of distillation in TorchVision.
well yeah i guess you're right. Before you have some kind of contrib section, I can use hub to use that model.
Not exactly contrib section. But you could create a repo and a hubconf.py
file. So that the model is accessible by torch.hub
Please pin this issue.
@datumbox
I want to add another object detection model: ATSS.
@xiaohu2015 You are on fire this quarter! π₯ π
Just to confirm you refer to ATSS which builds upon FCOS? If yes, We've used some of the tricks of this paper on #5444 to improve RetinaNet. Let's chat offline to discuss the details (just want to make sure we have the bandwidth to support on the reviews and avoid lengthy waits). Out of curiosity, originally we wanted to add DETR as part of #5410 so that we can add at least 1 Transformer based model for Detection. Would you be open focusing on this instead?
@datumbox Ok, I think transformer-based object detection model will be the trend. so I decide to work on DETR firstlyοΌI will open a new PR.
Hey @xiaohu2015 can I be bit of help? I have had worked on DeTR a bit before and it would be great learning for me to implement a model to torchvision. Will appreciate much if we could collaborate. Thanks :smiley:
@oke-aditya OK, but I think I should split the implementation into different parts, I will update this in the PR https://github.com/pytorch/vision/pull/5922
Lets have DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection ASAP
@santhoshnumberone Thanks for the proposal. I've added it in the list of potential models.
Hi! We are currently working to add DETR. See #5922 While Dino is SSL model and this is a task torchvision currently does not support. Not sure about future. You can have a look at VisSSL
Hi! We are currently working to add DETR. See #5922 While Dino is SSL model and this is a task torchvision currently does not support. Not sure about future. You can have a look at VisSSL
Are you sure?
Looking at paperswithcode
website
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Published paper
Authors are Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, Heung-Yeung Shum
facebookresearch/dino
Emerging Properties in Self-Supervised Vision Transformers authors proposing DINO Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, Armand Joulin
saying
We implement our findings into a simple self-supervised method,
called DINO, which we interpret as
a form of self-distillation with no labels
Different authors with different papers, I feel both are different. Could you guys go through it once
PS: Looking at the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection github repo one of the author say Code is under preparation, please be patient.
@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.
@santhoshnumberone I think DINO is more practical, since user can train less epochs to get good mAP.
What I meant was, if anyone of you could check if facebookresearch/dino and the DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection are not the same.
I feel both are different
There are many variants of DETR. E.g. deformable DETR, modulated DETR, A simple search would give these results.
https://github.com/search?q=DETR
Let's start of by including vanilla DETR :)
Hi , MobileOne introduced by Apple is interesting, the mobile-vision team implement it at https://github.com/facebookresearch/mobile-vision/pull/91 , is there any plan to support it there?
@zhiqwang MobileOne wan't in our shotlist but we can certainly keep an eye on it if it builds momentum.
Any small list for Semantic Segmentation models?
Maybe a tentative one
I can try U2Net. Maybe it's an easy Model
Yesss BiFPN is very popular. Maybe once we initiate EfficientDet it would get added.
Will you try to add EfficientDet?
I'm pretty noob when it comes to implementing models. But maybe I will give it a shot after I add a few easy models.
@datumbox Please Pin this issue.
π Feature
Adding new models to the models section.
Motivation
Many new models have been proposed in the recent years and do not exist in the models module. For example, the EfficientNets appear to provide with 8 models of different complexities that outperform everything else that exists at each complexity level.
Pitch
See Contributing to Torchvision - Models for guidance on adding new models.
Add pre-trained weights for the following variants: