Open fmassa opened 6 years ago
@fmassa Hey I would like to try adding some of these models. Can you tell me which ones of these you need help with?
@gokriznastic ideally we would want to have not only the model implementation, but also the weights and the training code that was used (if different from pytorch/examples/imagenet
).
This way, we have a reproducible way of obtaining the models.
I believe @tonylins will be adding support for MobileNetV2. All the others are open, so if you decide to take one, just let us know :-)
Hi there, I was wondering if you would be interested in a U-Net model. I've developed a very flexible version which allows variable depth, BN etcetera... I consider it's a very important model nowadays in audio and computer vision
Hi @JuanFMontesinos Definitely! Is this a segmentation model or a classification model?
Because for now, all of the models are for classification tasks, but we would like to extend it to other tasks as well (but it will require some thought so that we have the proper training / evaluation scripts)
@fmassa it was originally proposed as a segmentation architecture for biomedical applications. It is basically an encoder decoder architecture with skip connections widely used in blind sound source separation when working with spectrograms of the sound. It is also the core of GANs like pix2pix which is an imagr-to-image translation network and many others. That's why I suggested to include it. With respect to train it, it really depends on the application. Do you require a training framework and weights?
Regards
Hi @JuanFMontesinos ,
I see. We currently require all models in torchvision to have pre-trained weights, and ideally a code-base where we can train / evaluate on it. This becomes specially important for some complex models, like detection, where the model alone is generally not enough to be able to use it, and requires a number of helper functions.
@fmassa Hi, sorry for the late reply. Which task/dataset would you be interested in training U-Net for?
U-Net are usually used for segmentation, so I'd say maybe Pascal VOC segmentation task or Cityscapes? But there might be newer benchmarks out there that I'm not aware of
@fmassa VOC and Cityscapes dataset is large, there is a smaller dataset CamVid consisting 701 labeled images, and if you use U-Net for better performance, I think using the original medical dataset is better. Here is a good project named pytorch-semseg for semantic segmentation reimplementing U-Net.
VOC and Cityscapes might be large, but there have been a number of publications using them, and they are widely used in the scientific literature. That's why I think providing pre-trained models for one of those tasks might be relevant.
Sooo let me evaluate it after christmas to see which dataset would be better
Since we are adding MobileNet, it would be a good idea to add ShuffleNet as well given its improved performance over MobileNet.
Is there a priority among this list of models? I was planning to train a couple of models on Imagenet datasets from scratch and can contribute here.
Or are we to refer to the models from @cadene
@setuc I would really appreciate training ShuffleNet. This is small model so I assume it will take least time to start with it. Sincerely yours Igor
Hi @setuc Sorry for the delay in replying.
I'd say that you can pick whichever you'd prefer, but maybe ShuffleNet would be indeed easier because it's a small model.
I think Inception V4 might be quite hard to get to the reported accuracies, so maybe just ShuffletNet would be a great start already!
One more question @fmassa There are supposedly different version of Imagenet. I am currently using the one from Kaggle. I hope that should be sufficient. I have downloaded the images and plan to start the runs over the weekend.
There are supposedly different version of Imagenet.
Nearly everyone else is using ImageNet 2012 data, and most papers use that for comparisons.
@hendrycks i guess i was mistaken...the 2015 dataset is the same as that of 2012. I have started the runs and will do some validations before i share the results.Another 24 - 48 hours to completion.
@setuc cool! Let me know how it goes, and which training script / hyperparameters you used to train it
@fmassa I have used the training script from here https://github.com/pytorch/examples/tree/master/imagenet) as it was mentioned in the requirements in the top post. All the HPs remained the same, except batch size, which was changed to 1024. Unless we are free to play around with the learning rates (Cyclical learning rate etc), which i wasnt sure.
I am unable to reproduce the results for the network (Error 34.5% vs mine 39.811%) from the paper for 3 groups and no shuffle. My results are Acc@1 60.189 Acc@5 82.601
@setuc thanks for getting back to me with the results.
I believe we might need to adapt the learning rate / etc in order to reproduce the results for many of those papers.
If you change those, let me know which changes you did, so that we can keep track of all of it and so that I can summarize it afterwards
Restarting the training. Rewrote the Shufflenet v1 and V2 together with the cyclical learning rate. I think I have it right this time around. Started the training expecting another 72-80 hours to report back.
Edit: The cyclical rates worked. At 120 epochs the results are encouraging. For Shufflenet v2, the Top-1 error is 41.31 compared to 39.70 from the paper.
Edit2: At 220 epochs, the Top-1 error for ShuffleNet v2 is 40.51 compared to 39.70 from the paper.
Edit3: At 272 epochs, the Top-1 error for ShuffleNet v2 is 40.22 compared to 39.70 from the paper.
Edit4: At 320 epochs, the Top-1 error for ShuffleNet v2 is 39.96 compared to 39.70 from the paper.
@fmassa Should I be doing all the groups / scales reported in the paper for v1 and v2?
@fmassa I have completed about 400 epochs with a Top-1 error of 39.85 compared to 39.70 from the paper. Should this be sufficient?
For your reference, I've reproduced shufflenet v1 & v2 at https://github.com/tensorpack/tensorpack/blob/master/examples/ImageNetModels/shufflenet.py . It follows the paper's schedule (240 epochs without cyclic LR trick) and gets the same accuracy.
@setuc awesome! Could you check what @ppwwyyxx has sent to see if there is something else that you could do to get to the last few % so that we match the accuracies?
@fmassa I going over the code line by line and carrying out my comparisons from @ppwwyyxx. I had written the code from scratch. So checking again to see if I missed anything.
@setuc thanks! Did you figure out where the difference was?
(I think it is unlikely the community outside FAIR is going to train various ImageNet models in a timely manner, especially big models such as ResNeXt.)
@hendrycks I was planning on getting ResNeXt models trained here
I have an implementation of MNASNet that I could contribute. Any interest from maintainers? It performs pretty well, and I was able to get close to paper numbers with it, at 1.0 depth multiplier, training with SGD+Nesterov. I think it's currently the best "efficient" model out there.
Hi @1e100
Sure, it would be awesome to have it! Could you send a PR with it, and also pointing to the training code and hyperparameters that you used to obtain the results?
Will do. My own training pipeline is far too complicated to be suitable for something like this, so I'll just implement a single-file fast.ai trainer instead, train with it to something close to paper numbers, and then send a PR. In the interest of expediency, I plan to only verify reachable accuracy for depth multiplier 1.0 under this experimental setup.
Let me know if you see any flaws in this plan. Conservative ETA is about 1 week, 90% of which will be GPU time.
In the interest of not duplicating code, though, it'd be good to know how far along #625 is. MNASNet is basically just a hyperparameter tweak over MobileNetV2 wrt kernel sizes, layer depths, and block depths. In fact I implemented both using the exact same module.
OK, after some experimentation I got it to train to the following accuracy numbers: loss=1.076, prec@1=73.512, prec@5=91.544. Still not quite paper numbers, but paper numbers seem achievable with more epochs. I'll be putting together a PR later tonight.
FYI: paper number is 74.0% top1.
Awesome, thanks @1e100 !
I'll check your code and integrate it into references/classification
later this week
@setuc thanks! Did you figure out where the difference was?
@fmassa I tried doing comparison and ran it a couple of more times. Unfortunately, I dont quite have the paper numbers. The best Top-1 error for ShuffleNet v2 is 39.89 compared to 39.70 from the paper. Will that be sufficient for the pull request?
@setuc 39.89 vs 39.70 sounds close enough. that would be sufficient for sure.
@setuc Would you mind to share your training scripts for ShuffleNet-V2? I tried to use the ResNet training scripts but get a very low accuracy.
@fmassa Hi,
Can I upload PR about VoVNet?
The VoVNet was trained in same manners with pytorch/vision style.
To briefly describe VoVNet,
VoVNet is more efficient backbone network than ResNet & DenseNet in terms of GPU-computation and energy.
I implemented VoVNet classification models and maskrcnn-benchmark models.
classification models : https://github.com/stigma0617/VoVNet.pytorch maskrcnn-benchmark models : https://github.com/stigma0617/maskrcnn-benchmark-vovnet/tree/vovnet
Hi @stigma0617
I think for now it might be better to look into publishing it to torchhub, as it's a very recent paper?
@fmassa Hi, may I ask if the resnet101 group norm pretrained on Pytorch is available now?
@erichhhhho not in torchvision, as IIRC it doesn't bring performance improvements over the batch norm version.
Hi @fmassa
Would it be possible to add the ShuffleNet v2 x1.5 pretrained model, please? I would really appreciate it.
This is a master issue to track requests for adding new pre-trained models to torchvision.
Here is the (potentially incomplete) list I compiled:
@Cadene has already implemented a number of those models in his fantastic https://github.com/Cadene/pretrained-models.pytorch . I'll start from there and try to get models trained using
pytorch/examples/imagenet
, so that the models are reproducible.Requirements
vision/models
examples/imagenet
.