zsyOAOA / ResShift

ResShift: Efficient Diffusion Model for Image Super-resolution by Residual Shifting (NeurIPS 2023 Spotlight)
Other
659 stars 37 forks source link

Res-Shift weights without VQGAN #42

Open BadeiAlrahel opened 5 months ago

BadeiAlrahel commented 5 months ago

Thanks for the good work ! I assessed the quality of VQGAN on my data and it was really poor which caused poor quality as well when I used your model. So I want to not use any autoencoder anymore and was wondering if you have released your model weights without using any autoencoder since in the official paper Figure 2 you said that using an autoencoder is optional. I would really appreciate it, otherwise I would need to train the model from scratch...

zsyOAOA commented 5 months ago

Sorry, I have tried to train the model without an autoencoder, it worked well but required a very long training time. We haven't enough resources to train such a model. @BadeiAlrahel

BadeiAlrahel commented 5 months ago

Thank you for the fast response ! What do you mean exactly by a lot of time and ressources ? Did not you train on a A100? If yes how much time do you think the training would have taken on the A100 to completely train 500k iterations without the VQGAN ?

zsyOAOA commented 5 months ago

I can't remember the exact time for training 500k iterations. The backbone contains some blocks of Swin Transformer. These Swin transformer blocks are very slow if we directly train on the image space. I suggest you first fine-tune the autoencoder and then train ResShift.

BadeiAlrahel commented 5 months ago

Thank you for the rapid response ! I was actually trying to finetune your VQGAN that you included in the given weights but I could not find the code to compute the loss function in your repository, so I went to the LDM github just to get their loss function, which is LPIPSWithDiscriminator but the config file for their VQGAN with embed_dim=3 does not use the same parameters as yours... There were many errors when I used their training code and the loss function. Can you maybe give me some insights on how to proceed in order to finetune your VQGAN for my data ?

zsyOAOA commented 5 months ago

The VQGAN model is trained with this repo. You should modify the config file by yourself.

BadeiAlrahel commented 5 months ago

yeah I know that, I have surely looked at that repository but the problem is the config file where none of them were like the one used by ResShift ( yours has the following : embed_dim is not 3, z_channels is not equal to 3, attn_resolutions: []), I know that some changes in your code to adapt ResShift to handle the z_channels and embed_dim would do the work but I am not sure of the results that this would produce... How have you actually trained your VQGAN ? Did you use the taming transformers github ?

zsyOAOA commented 5 months ago

No, the employed VQGAN model in ResShift is directly borrowed from LDM. I have not trained or fine-tuned it.

UserYuXuan commented 3 months ago

No, the employed VQGAN model in ResShift is directly borrowed from LDM. I have not trained or fine-tuned it.

Can a 512-size autoencoder be used only on faces, or can it be applied to other types of data as well?

zsyOAOA commented 3 months ago

For the face restoration, the VQGAN is trained by myself. The checkpoint is accessable here. @UserYuXuan

UserYuXuan commented 3 months ago

For the face restoration, the VQGAN is trained by myself. The checkpoint is accessable here. @UserYuXuan

Thank you very much for your prompt reply!I tried to replace the autoencoder in this model with the f8 VQGAN and KL from the LDM model, and even if I change the configuration in the YML file to the parameters in LDM, I still encounter an error: RuntimeError: Error(s) in loading state_dict for VQModelTorch: Missing key(s) in state_dict: "encoder.conv_in.weight", "encoder.conv_in.bias", "encoder.down.0.block.0.norm1.weight",... Unexpected key(s) in state_dict: "epoch", "global_step", "pytorch-lightning_version", "state_dict", "callbacks", "optimizer_states", "lr_schedulers".

Agent-INF commented 2 months ago

The VQGAN model is trained with this repo. You should modify the config file by yourself.

Dear author, which repository are you using? The link you provided appears to be to a Bing search page, not a github repository.

zsyOAOA commented 2 months ago

The VQGAN model is trained with this repo. You should modify the config file by yourself.

Dear author, which repository are you using? The link you provided appears to be to a Bing search page, not a github repository.

Sorry, I have corrected the repo link.