About the CLIP backbone

liuzhengzhe / DreamStone-ISS

This is the project for DreamStone: TPAMI & ISS: ICLR 2023 spotlight

https://liuzhengzhe.github.io/DreamStone.github.io/

43 stars 3 forks source link

About the CLIP backbone #6

Open yangyangyang127 opened 11 months ago

yangyangyang127 commented 11 months ago

Dear Sir,

In my project, I need to change the CLIP backbone from ViT-B/32 (used in ISS) to ViT-L. During stage-1 training, the loss function is very high, I wonder if this is normal during your training? The loss is shown below:

In addition, how long will it take to finish the stage-1 training?

Thanks for your attention.

liuzhengzhe commented 11 months ago

Hi, you may try to finetune our released stage-1 model to see what loss is proper.

I trained it for around one day.

Zhengzhe

zhuxiangyang @.***> 于2023年9月16日周六 07:51写道：

Dear Sir,

In my project, I need to change the CLIP backbone from ViT-B/32 (used in ISS) to ViT-L. During stage-1 training, the loss function is very high, I wonder if this is normal during your training? The loss is shown below:

[image: image] https://user-images.githubusercontent.com/42626732/268454546-d5260a75-6274-4d85-9a07-174f7c77eeb8.png

In addition, how long will it take to finish the stage-1 training?

Thanks for your attention.

— Reply to this email directly, view it on GitHub https://github.com/liuzhengzhe/DreamStone-ISS/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7WPDACKW6GAFZXXHUMCYDX2W4IHANCNFSM6AAAAAA426A2AM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yangyangyang127 commented 11 months ago

Thanks for your timely response.

I still have a question about the Decoder or generator or M() function in the paper. I wonder if you tried to reduce the layer number of the fully connected layers, or if you tried batch norm after each layer? because batch norm may fasten the convergence.

Thanks for your help.

liuzhengzhe commented 11 months ago

I have tried a very small mapper, and the performance is comparable with the original result. See Figure 27 in https://arxiv.org/pdf/2209.04145.pdf

Zhengzhe

zhuxiangyang @.***> 于2023年9月17日周日 05:30写道：

Thanks for your timely response.

I still have a question about the Decoder or generator or M() function in the paper. I wonder if you tried to reduce the layer number of the fully connected layers, or if you tried batch norm after each layer? because batch norm may fasten the convergence.

Thanks for your help.

— Reply to this email directly, view it on GitHub https://github.com/liuzhengzhe/DreamStone-ISS/issues/6#issuecomment-1722466776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7WPDDLMASY2SITAHFDD43X23UPZANCNFSM6AAAAAA426A2AM . You are receiving this because you commented.Message ID: @.***>

yangyangyang127 commented 11 months ago

OK, thank you, sir.

Thanks for your contribution to this area. Your work and response is very helpful to me.

yangyangyang127 commented 11 months ago

Dear Sir,

I wonder if you could release the code to calculate the Quantitative Results in Table 1.
If you won't release the FID/FPD code, I wonder if they are the average among each category?

liuzhengzhe commented 11 months ago

Hi,

FID: we adopt this implementation. https://github.com/mseitzer/pytorch-fid
FPD: https://drive.google.com/file/d/1vniFpLFZwDfwMT3Ce2KXNQB8Bdv65iUG/view

Zhengzhe

zhuxiangyang @.***> 于2023年9月21日周四 06:57写道：

Dear Sir,

I wonder if you could release the code to calculate the Quantitative Results in Table 1.

If you won't release the FID/FPD code, I wonder if they are the average among each category?

— Reply to this email directly, view it on GitHub https://github.com/liuzhengzhe/DreamStone-ISS/issues/6#issuecomment-1729640674, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7WPDAEKNRAMQIURSUGCJLX3RBTBANCNFSM6AAAAAA426A2AM . You are receiving this because you commented.Message ID: @.***>