jina-ai / executors

internal-only
Apache License 2.0
31 stars 12 forks source link

Create TimmImageEncoder #136

Closed tadejsv closed 3 years ago

tadejsv commented 3 years ago

timm is the largest library of image models. Not only that, it also has a unified and very simple to use interface, that stays the same across models. For example,

from timm import create_model

# num_classes=0 will get you last layer features
model = create_model('my_model', num_classes=0, pretrained=True)

will work with when replacing my_model with resnet34 or vit_large_patch16_384 - two completely different model architectures.

This is an extremely low hanging fruit, I am shocked it has not been implemented yet (especially since some other models - like BigTransfer, that we have implemented, are just a small subset of what's available in this library).

vivek2301 commented 3 years ago

Hi @tadejsv,

Can you please assign this to me?

tadejsv commented 3 years ago

@vivek2301 sure, go ahead. Please check out this file (our new contributor guide, will be merged to master soon)

https://github.com/jina-ai/executors/blob/b6c33e689326d1ac7888ce32aaa27ef66047aa78/CONTRIBUTING.md

vivek2301 commented 3 years ago

Thanks. I'll follow the contributor guide.

Vidit-Ostwal-zz commented 3 years ago

Hi, @tadejsv, @vivek2301 , can i also work on this issue? Also, is the encoder already made, or need to bluid a new one.?

tadejsv commented 3 years ago

Hi @VIDIT-OSTWAL , for now @vivek2301 will be the one working on it, as I think this is a task suitable for a single person. If anything comes up, we'll let you know

Vidit-Ostwal-zz commented 3 years ago

Cool, np.

tadejsv commented 3 years ago

@vivek2301 are you working on this currently? If not, please let us know, so that I can assign the task to someone else

vivek2301 commented 3 years ago

@tadejsv yes, I'm working on this. It took some time to understand jina's framework and going through the cookbook and contribution guidelines. I've already completed a part of the code. I'm currently looking at the various models in timm as I need to select the respective layers from it for the encoding. Timm has the following modules: ['byoanet', 'byobnet', 'cait', 'coat', 'convit', 'cspnet', 'densenet', 'dla', 'dpn', 'efficientnet', 'ghostnet', 'gluon_resnet', 'gluon_xception', 'hardcorenas', 'hrnet', 'inception_resnet_v2', 'inception_v3', 'inception_v4', 'levit', 'mlp_mixer', 'mobilenetv3', 'nasnet', 'nfnet', 'pit', 'pnasnet', 'regnet', 'res2net', 'resnest', 'resnet', 'resnetv2', 'rexnet', 'selecsls', 'senet', 'sknet', 'swin_transformer', 'tnt', 'tresnet', 'twins', 'vgg', 'visformer', 'vision_transformer', 'vision_transformer_hybrid', 'vovnet', 'xception', 'xception_aligned']

Do I need to build the encoder for all the models or some subset of them?

tadejsv commented 3 years ago

@vivek2301 , great - then you can open a draft PR, and keep working on it. It doesn't have to be finished, but it will help us track the progress.

As for the models - I think user can simply pass the "model_name" string, and timm builds the correct model automatically. This is nothing we need to implement ourselves, the executor should be model-agnostic

vivek2301 commented 3 years ago

Sure, I will open the PR. Yes, it does not need to be built but the specific layer to get the encoding needs to be selected for each module at least. I think it's the same for PyTorch encoder: ImageTorchEncoder

I will try and complete this as soon as possible.

tadejsv commented 3 years ago

@vivek2301 I think this is not necessary - as I show in the example above, timm has a unified interface for all models, and with proper settings in create_model you get the pooled last layer features with the normal call to the object

vivek2301 commented 3 years ago

@tadejsv Awesome, thanks for this. Timm does not list num_classes as a parameter in create_model and I missed this. You saved me a lot of time, thanks.

tadejsv commented 3 years ago

Please check out its documentation, it has a section on feature extraction, which explains all of this