Finetune from existing model?

connorgreenwell commented 5 years ago

Its possible that I missed this in the Documentation, but is there a way to initialize from an existing model? For example: initializing an image encoder with a ResNet trained for the ImageNet task.

Thanks, csg

w4nderlust commented 5 years ago

You can initialize form an existing Ludwig model with --load_model. As for using external pretrained models, at the moment pretrained embeddings are supported in all sequential encoders. Loading pretrained ResNet and VGG-16 is coming next, stay tuned for it. Or if you want to contribute it that would be great!

simonMoisselin commented 5 years ago

Hi @w4nderlust ,

I really need ludwig to work with pre-trained models !

Any tips on how you want to add (for example) a pretrained VGG-16 on your codebase ?

I was thinking of using the same logic than sequential encoders:

adding a class named VGG16, in modules/images_encoder.py, using a class EmbedImage which play the same role of EmbedSequence.
adding a class named EmbedImage in embedding_modules.py
create some utils functions in utils/data_utils.py for loading and downloading pre-trained model weights.

Thanks, Simon

w4nderlust commented 5 years ago

@entruv it is something that we definitely want to do, it's just that we haven't prioritized it yet :) Anyway, if you have this urgency, I would actually go and implement it myself if I were you, it should be pretty simple. I would implement a VGGEncoder among image encoders that looks like this:

class VGGEncoder:
    def __init__(
            self,
            pretrained_model_path,
            **kwargs
    ):
        self.vgg = load_vgg(pretrained_model_path)

    def __call__(
            self,
            input_image,
            regularizer,
            dropout,
            is_training
    ):
        hidden = self.vgg(
            input_image,
            regularizer,
            dropout,
            is_training=is_training
        )
        hidden, hidden_size = flatten(hidden)
        return hidden, hidden_size

As you noted, the bulk is in the load_vgg function that you would have to implement, but I believe it's not an enormous amount of work. Then, don't forget to add your encoder in the registry in image_feature.py:

image_encoder_registry = {
    'stacked_cnn': Stacked2DCNN,
    'resnet': ResNetEncoder,
    'vgg': VGGEncoder
}

If you do and you are willing to share it with the community we would really gladly accept your contribution!

jdunnmon commented 4 years ago

Has there been any progress on this? Would like to use Ludwig for a couple of projects, but would need access to the broader world of pretrained checkpoints, and it's not 100% clear to me how to do this without writing a new encoder for each architecture I want to use. Thanks!

jimthompson5802 commented 4 years ago

@w4nderlust If there is still interest in aVGGEncoder, I'm willing to take a stab at implementing it.

jimthompson5802 commented 4 years ago

A point of clarification for the current proposal: In regards to the VGGEncoder, I'm thinking of creating the VGG equivalent to the ResNetEncoder, i.e., the user will be able to specify a VGG16 or VGG19 network architecture, e.g., have parameter vgg_size=[16 | 19].

Loading of a pre-trained VGG model will be a separate function.

w4nderlust commented 4 years ago

A point of clarification for the current proposal: In regards to the VGGEncoder, I'm thinking of creating the VGG equivalent to the ResNetEncoder, i.e., the user will be able to specify a VGG16 or VGG19 network architecture, e.g., have parameter vgg_size=[16 | 19].

Loading of a pre-trained VGG model will be a separate function.

Makes sense, although I believe 2 encoders for pretrained ResNet and pretrained VGG would be more valuable. But ideally we want to have both vgg and resnet both trained from scratch and pretrained.

jimthompson5802 commented 4 years ago

OK...I understand "2 encoders for pretrained ResNet and pretrained VGG would be more valuable." I'll shift the focus. A few questions:

In regards to ResNet, do you want the encoder for "pretrained ResNet" to be different from the existing ResNetEncoder class or should I extend the existing ResNetEncoder class to support loading pretrained weights? I'm thinking the preferred approach is to have different encoder classes because Resnet and VGG have well know predefined structures. There is no need to specify details of the network architecture. If I'm heading down the wrong path, let me know.
As for specifying the location of the pre-trained weights, I'm thinking of adding a key to the model definition structure, something like pretrained_weights: <file location>

My current thinking on model definition for pretrained Resnet would look something like this:

input_features:
  - name: image_column_name
     type: image
     encoder: pretrained_resnet
     resnet_size: 50
     pretrained_weights: ./my_resnet/pre-trained-weights

Pretrained VGG would be something like

input_features:
  - name: image_column_name
     type: image
     encoder: pretrained_vgg
     resnet_size: 16
     pretrained_weights: ./my_vgg/pre-trained-weights

Is it correct to assume that the format of the pretrained weights that will be loaded is in the format created by tf.compat.v1.train.Saver()?
I've read in another issue that work is underway for Ludwig to support TF2. Any guidance on this particular issue? Absent any guidance, I'm following existing TF api usage.

w4nderlust commented 4 years ago

I guess one thing we could do in the future is calling VGG adn ResNet ecnoders that only load pretrained models vgg and resnet and calling the ones that don't load weights like custom_vgg and custom_resnet for clarity.

Regarding how to structure parameters, yes what you are suggesting makes sense. If you want a reference on a similar implementation for we did for BERT you can check the BERT encoder here: https://github.com/uber/ludwig/blob/master/ludwig/models/modules/sequence_encoders.py#L1663

One additional caveat is that VGG and ResNet expect input image tensors to be of a specific size. For this reason, in all examples and docstrings and comments, you should point out that the use has to either provide images of the right size or has to specify image preprocessing parameters accordingly (like height, width and resizing strategy). An additional caveat is that the user may not want to collect the representations from the last layer but from previous ones. I would add an optional parameter containing the name of the layer for obtaining the representations, by default it's the one before the final transformation into logits. Finally, finetuning is an additional parameter, by default I would say no finetuning.

Regarding the format, yes the move to TF2 may change things, that's actually why I wanted to complete that one first before addressing other feature requests. After TF2 one could imagine to use Keras models for that for instance. Otherwise we can definitely find some place where the checkpoints are provided and use those instead, but knowing that in the future we may want to switch.

Final consideration, which is an issue also emerging in the BERT case that we haven't solved yet is the following: after I save my Ludwig model, the weights will contain also the weights of the VGG or ResNet encoder, so when those models are loaded for either further tuning or for serving there is no need to actually bring the original checkpoints with you, but at the moment in the BERT case you actually have to as that is a required parameter. One option here could be the following: if the weights are specified, then fine, if they are not, the model is randomly initialized, as then the saver that restores the weights will override those random weights, but we have to notify the user that if this is loading a previously trained Ludwig model this is fine, but if this is the first training, that is not fine and you have to specify the checkpoint position. I'm open to alternative solutions here.

justinxzhao commented 2 years ago

Ludwig supports reloading and continuing training.

Ludwig has also integrated with Huggingface for loading and fine-tuning pretrained models.

https://ludwig-ai.github.io/ludwig-docs/latest/configuration/features/text_features/#huggingface-encoders

ludwig-ai / ludwig

Finetune from existing model? #93