pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.31k stars 6.97k forks source link

[Master Issue] Add more models to torchvision #645

Open fmassa opened 6 years ago

fmassa commented 6 years ago

This is a master issue to track requests for adding new pre-trained models to torchvision.

Here is the (potentially incomplete) list I compiled:

@Cadene has already implemented a number of those models in his fantastic https://github.com/Cadene/pretrained-models.pytorch . I'll start from there and try to get models trained using pytorch/examples/imagenet, so that the models are reproducible.


Requirements

gokriznastic commented 6 years ago

@fmassa Hey I would like to try adding some of these models. Can you tell me which ones of these you need help with?

fmassa commented 6 years ago

@gokriznastic ideally we would want to have not only the model implementation, but also the weights and the training code that was used (if different from pytorch/examples/imagenet).

This way, we have a reproducible way of obtaining the models.

I believe @tonylins will be adding support for MobileNetV2. All the others are open, so if you decide to take one, just let us know :-)

JuanFMontesinos commented 5 years ago

Hi there, I was wondering if you would be interested in a U-Net model. I've developed a very flexible version which allows variable depth, BN etcetera... I consider it's a very important model nowadays in audio and computer vision

fmassa commented 5 years ago

Hi @JuanFMontesinos Definitely! Is this a segmentation model or a classification model?

Because for now, all of the models are for classification tasks, but we would like to extend it to other tasks as well (but it will require some thought so that we have the proper training / evaluation scripts)

JuanFMontesinos commented 5 years ago

@fmassa it was originally proposed as a segmentation architecture for biomedical applications. It is basically an encoder decoder architecture with skip connections widely used in blind sound source separation when working with spectrograms of the sound. It is also the core of GANs like pix2pix which is an imagr-to-image translation network and many others. That's why I suggested to include it. With respect to train it, it really depends on the application. Do you require a training framework and weights?

Regards

fmassa commented 5 years ago

Hi @JuanFMontesinos ,

I see. We currently require all models in torchvision to have pre-trained weights, and ideally a code-base where we can train / evaluate on it. This becomes specially important for some complex models, like detection, where the model alone is generally not enough to be able to use it, and requires a number of helper functions.

JuanFMontesinos commented 5 years ago

@fmassa Hi, sorry for the late reply. Which task/dataset would you be interested in training U-Net for?

fmassa commented 5 years ago

U-Net are usually used for segmentation, so I'd say maybe Pascal VOC segmentation task or Cityscapes? But there might be newer benchmarks out there that I'm not aware of

guanfuchen commented 5 years ago

@fmassa VOC and Cityscapes dataset is large, there is a smaller dataset CamVid consisting 701 labeled images, and if you use U-Net for better performance, I think using the original medical dataset is better. Here is a good project named pytorch-semseg for semantic segmentation reimplementing U-Net.

fmassa commented 5 years ago

VOC and Cityscapes might be large, but there have been a number of publications using them, and they are widely used in the scientific literature. That's why I think providing pre-trained models for one of those tasks might be relevant.

JuanFMontesinos commented 5 years ago

Sooo let me evaluate it after christmas to see which dataset would be better

varunagrawal commented 5 years ago

Since we are adding MobileNet, it would be a good idea to add ShuffleNet as well given its improved performance over MobileNet.

setuc commented 5 years ago

Is there a priority among this list of models? I was planning to train a couple of models on Imagenet datasets from scratch and can contribute here.

Or are we to refer to the models from @cadene

IgorKasianenko commented 5 years ago

@setuc I would really appreciate training ShuffleNet. This is small model so I assume it will take least time to start with it. Sincerely yours Igor

fmassa commented 5 years ago

Hi @setuc Sorry for the delay in replying.

I'd say that you can pick whichever you'd prefer, but maybe ShuffleNet would be indeed easier because it's a small model.

I think Inception V4 might be quite hard to get to the reported accuracies, so maybe just ShuffletNet would be a great start already!

setuc commented 5 years ago

One more question @fmassa There are supposedly different version of Imagenet. I am currently using the one from Kaggle. I hope that should be sufficient. I have downloaded the images and plan to start the runs over the weekend.

hendrycks commented 5 years ago

There are supposedly different version of Imagenet.

Nearly everyone else is using ImageNet 2012 data, and most papers use that for comparisons.

setuc commented 5 years ago

@hendrycks i guess i was mistaken...the 2015 dataset is the same as that of 2012. I have started the runs and will do some validations before i share the results.Another 24 - 48 hours to completion.

fmassa commented 5 years ago

@setuc cool! Let me know how it goes, and which training script / hyperparameters you used to train it

setuc commented 5 years ago

@fmassa I have used the training script from here https://github.com/pytorch/examples/tree/master/imagenet) as it was mentioned in the requirements in the top post. All the HPs remained the same, except batch size, which was changed to 1024. Unless we are free to play around with the learning rates (Cyclical learning rate etc), which i wasnt sure.

I am unable to reproduce the results for the network (Error 34.5% vs mine 39.811%) from the paper for 3 groups and no shuffle. My results are Acc@1 60.189 Acc@5 82.601

fmassa commented 5 years ago

@setuc thanks for getting back to me with the results.

I believe we might need to adapt the learning rate / etc in order to reproduce the results for many of those papers.

If you change those, let me know which changes you did, so that we can keep track of all of it and so that I can summarize it afterwards

setuc commented 5 years ago

Restarting the training. Rewrote the Shufflenet v1 and V2 together with the cyclical learning rate. I think I have it right this time around. Started the training expecting another 72-80 hours to report back.

Edit: The cyclical rates worked. At 120 epochs the results are encouraging. For Shufflenet v2, the Top-1 error is 41.31 compared to 39.70 from the paper.

Edit2: At 220 epochs, the Top-1 error for ShuffleNet v2 is 40.51 compared to 39.70 from the paper.

Edit3: At 272 epochs, the Top-1 error for ShuffleNet v2 is 40.22 compared to 39.70 from the paper.

Edit4: At 320 epochs, the Top-1 error for ShuffleNet v2 is 39.96 compared to 39.70 from the paper.

@fmassa Should I be doing all the groups / scales reported in the paper for v1 and v2?

setuc commented 5 years ago

@fmassa I have completed about 400 epochs with a Top-1 error of 39.85 compared to 39.70 from the paper. Should this be sufficient?

ppwwyyxx commented 5 years ago

For your reference, I've reproduced shufflenet v1 & v2 at https://github.com/tensorpack/tensorpack/blob/master/examples/ImageNetModels/shufflenet.py . It follows the paper's schedule (240 epochs without cyclic LR trick) and gets the same accuracy.

fmassa commented 5 years ago

@setuc awesome! Could you check what @ppwwyyxx has sent to see if there is something else that you could do to get to the last few % so that we match the accuracies?

setuc commented 5 years ago

@fmassa I going over the code line by line and carrying out my comparisons from @ppwwyyxx. I had written the code from scratch. So checking again to see if I missed anything.

fmassa commented 5 years ago

@setuc thanks! Did you figure out where the difference was?

hendrycks commented 5 years ago

(I think it is unlikely the community outside FAIR is going to train various ImageNet models in a timely manner, especially big models such as ResNeXt.)

fmassa commented 5 years ago

@hendrycks I was planning on getting ResNeXt models trained here

1e100 commented 5 years ago

I have an implementation of MNASNet that I could contribute. Any interest from maintainers? It performs pretty well, and I was able to get close to paper numbers with it, at 1.0 depth multiplier, training with SGD+Nesterov. I think it's currently the best "efficient" model out there.

https://arxiv.org/abs/1807.11626

fmassa commented 5 years ago

Hi @1e100

Sure, it would be awesome to have it! Could you send a PR with it, and also pointing to the training code and hyperparameters that you used to obtain the results?

1e100 commented 5 years ago

Will do. My own training pipeline is far too complicated to be suitable for something like this, so I'll just implement a single-file fast.ai trainer instead, train with it to something close to paper numbers, and then send a PR. In the interest of expediency, I plan to only verify reachable accuracy for depth multiplier 1.0 under this experimental setup.

Let me know if you see any flaws in this plan. Conservative ETA is about 1 week, 90% of which will be GPU time.

1e100 commented 5 years ago

In the interest of not duplicating code, though, it'd be good to know how far along #625 is. MNASNet is basically just a hyperparameter tweak over MobileNetV2 wrt kernel sizes, layer depths, and block depths. In fact I implemented both using the exact same module.

1e100 commented 5 years ago

OK, after some experimentation I got it to train to the following accuracy numbers: loss=1.076, prec@1=73.512, prec@5=91.544. Still not quite paper numbers, but paper numbers seem achievable with more epochs. I'll be putting together a PR later tonight.

1e100 commented 5 years ago

FYI: paper number is 74.0% top1.

1e100 commented 5 years ago

MNASNet: https://github.com/pytorch/vision/pull/829 Trainer: https://github.com/1e100/mnasnet_trainer/tree/master

fmassa commented 5 years ago

Awesome, thanks @1e100 !

I'll check your code and integrate it into references/classification later this week

setuc commented 5 years ago

@setuc thanks! Did you figure out where the difference was?

@fmassa I tried doing comparison and ran it a couple of more times. Unfortunately, I dont quite have the paper numbers. The best Top-1 error for ShuffleNet v2 is 39.89 compared to 39.70 from the paper. Will that be sufficient for the pull request?

soumith commented 5 years ago

@setuc 39.89 vs 39.70 sounds close enough. that would be sufficient for sure.

D-X-Y commented 5 years ago

@setuc Would you mind to share your training scripts for ShuffleNet-V2? I tried to use the ResNet training scripts but get a very low accuracy.

stigma0617 commented 5 years ago

@fmassa Hi,

Can I upload PR about VoVNet?

The VoVNet was trained in same manners with pytorch/vision style.

To briefly describe VoVNet,

VoVNet is more efficient backbone network than ResNet & DenseNet in terms of GPU-computation and energy.

I implemented VoVNet classification models and maskrcnn-benchmark models.

classification models : https://github.com/stigma0617/VoVNet.pytorch maskrcnn-benchmark models : https://github.com/stigma0617/maskrcnn-benchmark-vovnet/tree/vovnet

fmassa commented 5 years ago

Hi @stigma0617

I think for now it might be better to look into publishing it to torchhub, as it's a very recent paper?

erichhhhho commented 5 years ago

@fmassa Hi, may I ask if the resnet101 group norm pretrained on Pytorch is available now?

fmassa commented 5 years ago

@erichhhhho not in torchvision, as IIRC it doesn't bring performance improvements over the batch norm version.

edsgerls commented 4 years ago

Hi @fmassa

Would it be possible to add the ShuffleNet v2 x1.5 pretrained model, please? I would really appreciate it.

wangg12 commented 3 years ago

Would you like to add ResNeSt models?