lucidrains / x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers
MIT License
682 stars 46 forks source link

Allow other types of visual SSL when initiating CLIP #5

Closed Froskekongen closed 2 years ago

Froskekongen commented 2 years ago

In the following code as part of CLIP.__init__

        if use_visual_ssl:
            if visual_ssl_type == 'simsiam':
                ssl_type = SimSiam
            elif visual_ssl_type == 'simclr':
                ssl_type = partial(SimCLR, temperature = simclr_temperature)
            else:
                raise ValueError(f'unknown visual_ssl_type')

            self.visual_ssl = ssl_type(
                self.visual_transformer,
                image_size = visual_image_size,
                hidden_layer = visual_ssl_hidden_layer
            )

the visual self-supervised learning is hardcoded. I would suggest changing this to accept the visual SSL module as an argument when instantiating CLIP to allow flexibility in the same manner as it does for the image encoder and text encoder.

Example:

barlow = BarlowTwins(augmentatation_fns)
clip = CLIP(..., visual_ssl=barlow)
lucidrains commented 2 years ago

@Froskekongen Hi Erlend! Took up your suggestion here https://github.com/lucidrains/x-clip/tree/0.2.4#custom-vision-self-supervised-learning-module let me know if that works for you

lucidrains commented 2 years ago

How is your experience with Barlow? Does it work?

Froskekongen commented 2 years ago

Thanks a lot!

BarlowTwins was just an example. Personally, I work with frameworks that are more akin to VICReg (https://arxiv.org/abs/2105.04906) and VIbCreg (https://arxiv.org/abs/2109.00783).

And I am investigating CLIP with other modalities than images and words, with less data.

lucidrains commented 2 years ago

@Froskekongen awesome! hope this feature is fruitful for you then! :)