lucidrains / CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
MIT License
1.04k stars 88 forks source link

convolution encoder better result then vit #14

Open Alexandr1111111 opened 1 year ago

Alexandr1111111 commented 1 year ago

Thank you for this work. The generalizing ability of neural networks based on convolution layers is much greater.

class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        from efficientnet_pytorch import EfficientNet
        self.model = EfficientNet.from_pretrained('efficientnet-b4')
        self.model = EfficientNet.from_pretrained('efficientnet-b4')
        self.model._fc = torch.nn.Linear(1792, 1024)
        self.conv1D = torch.nn.Conv1d(1, 128, 3, padding='same')

    def forward(self, x):

        x = self.model(x)
        x = torch.unsqueeze(x, 1)
        x = self.conv1D(x)
        # return (batch, seq, dim)
        return x
JackWhite-rwx commented 1 year ago

thanks!could you give me a pretrained mdoel ?

Alexandr1111111 commented 1 year ago

Yes!Check on validation dataset.(Train, Test, Validation)On this week.Is that okay with You?Good luck!05:40, 4 марта 2023 г., "JackWhite-rwx" @.***>: thanks!could you give me a pretrained mdoel ?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.***> -- Отправлено из мобильного приложения Яндекс Почты

JackWhite-rwx commented 1 year ago

thank you very much!

Alexandr1111111 commented 1 year ago

Jack Please read the readme file.The model is retrained, but shows good results. VIT did not recognize most of the validation dataset. This network recognizes almost everything, but writes a lot of associations that are not in the photo. This happened after retraining the model.I hope this helps You. 04.03.2023, 06:25, "JackWhite-rwx" @.>:  thank you very much!—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> К письму приложены файлы на Яндекс Диске: CoCaPretrain.zip (315397104)

JackWhite-rwx commented 1 year ago

Thank you again,but I can't find the 'CoCaPretrain.zip' file in the attachment.Could you tell me where to download it?

------------------ 原始邮件 ------------------ 发件人: "lucidrains/CoCa-pytorch" @.>; 发送时间: 2023年3月9日(星期四) 凌晨5:02 @.>; @.**@.>; 主题: Re: [lucidrains/CoCa-pytorch] convolution encoder better result then vit (Issue #14)

Jack Please read the readme file.The model is retrained, but shows good results. VIT did not recognize most of the validation dataset. This network recognizes almost everything, but writes a lot of associations that are not in the photo. This happened after retraining the model.I hope this helps You. 04.03.2023, 06:25, "JackWhite-rwx" @.>:  thank you very much!—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: @.> К письму приложены файлы на Яндекс Диске: CoCaPretrain.zip (315397104) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

Alexandr1111111 commented 1 year ago

  Does your mail server not allow you to download attached files?You could not open the email via gmail or yandex.Or report the gmail address.If you do not have such an opportunity, I will put it on google disk and give you a link.Which option will suit you?Message ID: @.***>

JackWhite-rwx commented 1 year ago

Yes,I can not receive the file at qq email. Could you put it on google disk and give me a link, or please send the file to    @.***     . Thank you very much.

---Original--- From: @.> Date: Thu, Mar 9, 2023 15:52 PM To: @.>; Cc: @.**@.>; Subject: Re: [lucidrains/CoCa-pytorch] convolution encoder better result thenvit (Issue #14)

  Does your mail server not allow you to download attached files?You could not open the email via gmail or yandex.Or report the gmail address.If you do not have such an opportunity, I will put it on google disk and give you a link.Which option will suit you?Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Alexandr1111111 commented 1 year ago

  Check this https://drive.google.com/file/d/1pLXWzpPOzBfGBC3YuleQTYZrv-gO3_zo/view?usp=sharingMessage ID: @.***>

JackWhite-rwx commented 1 year ago

Very good, I can download the file now, thank you again

---Original--- From: @.> Date: Thu, Mar 9, 2023 16:21 PM To: @.>; Cc: @.**@.>; Subject: Re: [lucidrains/CoCa-pytorch] convolution encoder better result thenvit (Issue #14)

  Check this https://drive.google.com/file/d/1pLXWzpPOzBfGBC3YuleQTYZrv-gO3_zo/view?usp=sharingMessage ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Alexandr1111111 commented 1 year ago

  Please let us know, if possible, how satisfied you are with the result of CoCa + EfficientNet_b4" ThanksMessage ID: @.***>

Alexandr1111111 commented 1 year ago

  Jack sorry, could you give an estimate CoCa + EfficientNetMessage ID: @.***>

JackWhite-rwx commented 1 year ago

Of course, I will give you feedback when I have finished testing on the few shot classification and other low level task. Sorry for the delay in replying to your email

---Original--- From: @.> Date: Tue, Mar 14, 2023 06:03 AM To: @.>; Cc: @.**@.>; Subject: Re: [lucidrains/CoCa-pytorch] convolution encoder better result thenvit (Issue #14)

  Jack sorry, could you give an estimate CoCa + EfficientNetMessage ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Alexandr1111111 commented 1 year ago

  Now I am training another network based on the transformer architecture. They have the same mistakes. I believe the inaccuracy of coca net is not in the architecture, but in the dataset. There are topics in which the network works perfectly. Others have a lot of excess, a lot of associations.Take a look at this image.sunrise or sunset on the river bank.The network writes that a man is surfing.During the training in the dataset there was no description of nature, sunrises, sunsets. She saw this background when a man was surfing. This is what I call an association.And such examples can be seen throughout the dataset. To get the desired result, you need to correctly create a dataset by topic, this dataset is created from random images.a person holding a surfboard standing on a beach . a person holding a surfboard near the waves in the ocean . a shirtless man standing on the beach  holding a red surfboardMessage ID: @.***>