yizhiwang96 / deepvecfont

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning
MIT License
182 stars 31 forks source link

Few-shot font generation via different `ref_nshot` #34

Open rutopio opened 1 year ago

rutopio commented 1 year ago

Hi, there,

The checkpoints of pre-trained model use ref_nshot=4 for training and testing.

However, it seems like if I want to use different ref_nshot value (for example, 6, for ABCabc), I need to train the model from beginning rather than using the pre-trained model directly, or it will show dimension unmatch error during test:

RuntimeError: Error(s) in loading state_dict for ModalityFusion:
size mismatch for seq_fc.weight: copying a param with shape torch.Size([512, 4096]) from checkpoint, the shape in current model is torch.Size([512, 6144]).

Is there any more efficiency way to test the performance on different ref_nshot, such as training ModalityFusion only, rather than retrained the model everytime?

Thank you.

yizhiwang96 commented 1 year ago

Hi, currently for the branch of image syntheis, ref_nshot can be set to any number. But for the sequence branch, our model used a fully connected layer where the input dim is ref_nshot * base_channel, so the ref_nshot must be fixed. One possible approach you can try is to replace the fully connected layer with a mean operation which averages all features from glyph sequences.