Local feature CNNs in models?

pytorch / vision

Datasets, Transforms and Models specific to Computer Vision

https://pytorch.org/vision

BSD 3-Clause "New" or "Revised" License

16.34k stars 6.97k forks source link

Local feature CNNs in models? #413

Open ducha-aiki opened 6 years ago

ducha-aiki commented 6 years ago

Hi,

Are the state-of-the-art local patch descriptors welcomed in models? E.g. HardNet https://github.com/DagnyT/hardnet They are trained in pytorch + torchvision and are potentially useful for low-level vision tasks. However, I understand that they are less transferable than ImageNet-trained ones.

alykhantejani commented 6 years ago

If they were trained in pytorch from scratch and an example script could also be provided I think this would be useful. Also this could be useful for more classical vision tasks. However, @fmassa might have an opinion on this

ducha-aiki commented 6 years ago

They were trained from scratch in pytorch. Regarding example script, do you mean training or usage?

alykhantejani commented 6 years ago

training - to reproduce the pre-trained model

ducha-aiki commented 6 years ago

Sure, it is here https://github.com/DagnyT/hardnet/blob/master/code/HardNet.py If all maintainers agree, is there a guidelines how to add such model, given that model should be somehow uploaded to pytorch.org ?

ducha-aiki commented 6 years ago

@fmassa Could you please answer, if we should prepare PR with HardNet or not?

fmassa commented 6 years ago

Hi @ducha-aiki Sorry for the delay in replying.

I think extending the scope of the models in torchvision is good, but I have a few remarks about that that I'd like to share:

having the training code used to obtain the model is important. You already open-sourced your implementation, so this is not a problem. But
- given that the code is in a repo that does not live under pytorch org, I wonder once we have new releases of pytorch that have breaking changes, how we would do to keep things synchronized.
- if the model contain 3rdparty modules, like L2Norm, in order to run the code we might need your repo installed (in this case, we could replace this function with pytorch functions, but the same applies for custom losses). A solution could be to push the custom modules to pytorch / torchvision, but we need to figure out the right format for that.
Once we figure those points out, I think we might want to have an hierarchical / task-based modelzoo, which will contain different models for a given task, and the training / evaluation code is easily available for the model. For the moment, the only (implicit) task we have is imagenet classification.

I'm open to suggestions on both points.

ducha-aiki commented 6 years ago

I would start with second point. I like a lot an idea of task-based model zoo and torchvision already has a ground for this: datasets. As end-user I would really appreciate if can do something like this:

 from datasets import MSCOCO
 from models import vgg19, segnet
 model = segnet;
 model.initialize_from(vgg19)
 train(model, dataset)

And the same for each task, which can be inter connected. The first component is dataset, which are already there. The second component could be a typical training flow: loss function, input, outputs, lr schedule.

E.g. for classification it is logloss mininization, for metric learning - some triplet loss with, possible hard negative mining, for segmentation dice loss, etc.

Regarding the custom classes, I think they should be or merged into main repo, if they are useful for not single task, or the model should be self-contained. E.g. L2Norm for HardNet doesn`t satisfy this requirements, while input_norm does, as it is defined inside the model.

fmassa commented 6 years ago

Yes, there is a lot of demand for extending torchvision to have pre-trained models from other domains than imagenet classification, and that's something I'm actively working on now.

I'm not sure it's the responsibility of torchvision to provide the train method in your previous message. Except if we extend the scope of torchvision to also contain training code for different classes of problems.

One thing I think we should also be careful is with how fast we extend torchvision by adding new models, we should have some guidelines for that. Or we should have a contrib folder which relax a bit those constraints. Maybe following the guidelines of PyTorch, and only merge into master new layers that have been published and have been used by a few papers already?