yeungchenwa / FontDiffuser

[AAAI2024] FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
https://yeungchenwa.github.io/fontdiffuser-homepage/
298 stars 25 forks source link

Reproducing experiments: Removing MCA, RSI, SCR #61

Open mashito707 opened 1 month ago

mashito707 commented 1 month ago

Thank you for the detailed work on this project! I'm attempting to reproduce the experiments described in Table 3 (Effectiveness of different modules) by progressively removing the MCA, RSI, and SCR modules. I have a few questions about the implementation:

Removing MCA and RSI:

I noticed that the build_unet function uses the following configurations for down and up block types:

down_block_types=(
    "DownBlock2D",
    "MCADownBlock2D",
    "MCADownBlock2D",
    "DownBlock2D",
),
up_block_types=(
    "UpBlock2D",
    "StyleRSIUpBlock2D",
    "StyleRSIUpBlock2D",
    "UpBlock2D",
),

To remove MCA and RSI, is it correct to simply replace StyleRSIUpBlock2D with UpBlock2D and MCADownBlock2D with DownBlock2D? If these custom modules are removed, will the content and style features still be processed correctly by the model, or is there any additional code adjustment required to handle this change?

Skipping SCR:

If I don't want to use the SCR module (which requires phase-2 training for fine-tuning), can I continue using the model obtained from phase-1? Is this model still runnable without the additional fine-tuning step, (i.e. ready for sample.py) or would there be compatibility issues in skipping phase-2?

Thanks again for your time and assistance. I appreciate any guidance on these points!

yeungchenwa commented 1 week ago

hi @mashito707, sorry for my late reply. You can not simply remove MCA and RSI block as you say. Also, if your custom-trained model is only trained using phase-1, it also perform well on the font generation.