ligoudaner377 / font_translator_gan

88 stars 9 forks source link

error when I run evaluate.sh #10

Closed manhvela closed 2 years ago

manhvela commented 2 years ago

Hello again,

I trained and tested with my own dataset and when I execute evaluate.sh I get the following error:

loading the model from evaluator/checkpoints/latest_content_resnet.pth Traceback (most recent call last): File "evaluate.py", line 17, in evaluator = Evaluator(opt, num_classes=training_data.num_classes, text2label=training_data.text2label) File "/home/manhvela/Desktop/greek_model/font_translator_gan/evaluator/evaluator.py", line 21, in init self.criterionFID = FID(opt.evaluate_mode, num_classes, gpu_ids=opt.gpu_ids) File "/home/manhvela/Desktop/greek_model/font_translator_gan/evaluator/fid.py", line 8, in init self.classifier = Classifier(mode, num_classes, gpu_ids=gpu_ids, isTrain=False) File "/home/manhvela/Desktop/greek_model/font_translator_gan/evaluator/classifier.py", line 31, in init self.load_networks('latest')
File "/home/manhvela/Desktop/greek_model/font_translator_gan/evaluator/classifier.py", line 87, in load_networks net.load_state_dict(state_dict) File "/home/manhvela/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for ResNet: size mismatch for fc.weight: copying a param with shape torch.Size([1074, 2048]) from checkpoint, the shape in current model is torch.Size([118, 2048]). size mismatch for fc.bias: copying a param with shape torch.Size([1074]) from checkpoint, the shape in current model is torch.Size([118]).

How could I fix this?

ligoudaner377 commented 2 years ago

Hi @manhvela evaluate.py is a rough script with so many hard codings, sorry for that. when you invoke evaluate.py, a neural network called "classifier" is used to evaluate the generated images. So, if you change the dataset, you need to retrain the "classifier".

Try to delete all .pth files in font_translator/evaluator/checkpoints/ and run evaluate.sh again

Hope that works for you!

manhvela commented 2 years ago

It works! Thanks for the fast reply! Could you explain what is the difference between phase="test_unknown_content" and phase="test_unknown_content"? As they both use --evaluate_mode style and --evaluate_mode content

Also, is there a way to fine tune your model? I mean, is there a ready script?

ligoudaner377 commented 2 years ago

Nice to hear that works!

for your first question, actually, test.sh supports two different test modes which are: --phase "test_unknown_content" --phase "test_unknown_style"

the generation behavior of the proposed model is: x = f(s, c) where x is the generated image, s is the style images and c is the content image, f() is our model.

"test_unknown_content" means that during testing, the style images s are already seen by the model during training, but the model never saw the content images "test_unknown_style" is the same, although the model has already seen the content images during training, it never saw the style images.

and evaluate.sh supports four kinds of evaluations which are: --phase "test_unknown_content" --evaluate_mode "style" --phase "test_unknown_content" --evaluate_mode "content" --phase "test_unknown_style" --evaluate_mode "style" --phase "test_unknown_style" --evaluate_mode "content"

--evaluate_mode "style" means that we evaluate the generated image x shares the same style with the input style image s or not. --evaluate_mode "content" means that we evaluate the generated image x is the same content as the input content image c or not

By the way, we provide a script.sh that packs training, testing, evaluation.

for your second question, unfortunately, the answer is no. the current code doesn't support fine-tune. It takes some effort to enable it.

manhvela commented 2 years ago

Nice to hear that works!

for your first question, actually, test.sh supports two different test modes which are: --phase "test_unknown_content" --phase "test_unknown_style"

the generation behavior of the proposed model is: x = f(s, c) where x is the generated image, s is the style images and c is the content image, f() is our model.

"test_unknown_content" means that during testing, the style images s are already seen by the model during training, but the model never saw the content images "test_unknown_style" is the same, although the model has already seen the content images during training, it never saw the style images.

and evaluate.sh supports four kinds of evaluations which are: --phase "test_unknown_content" --evaluate_mode "style" --phase "test_unknown_content" --evaluate_mode "content" --phase "test_unknown_style" --evaluate_mode "style" --phase "test_unknown_style" --evaluate_mode "content"

--evaluate_mode "style" means that we evaluate the generated image x shares the same style with the input style image s or not. --evaluate_mode "content" means that we evaluate the generated image x is the same content as the input content image c or not

By the way, we provide a script.sh that packs training, testing, evaluation.

for your second question, unfortunately, the answer is no. the current code doesn't support fine-tune. It takes some effort to enable it.

I see, I understand now. Thank you.

One last question if I may. I understand the Chinese language has thousands of characters so content generalization of characters is important. My dataset is about Greek language. Given that the Greek alphabet consists only of 24 characters (48 with capitals) is there a reason to use content discriminator? And is there a reason to test and evaluate content discriminator?

ligoudaner377 commented 2 years ago

For your first question: The purpose of using a content discriminator is to check the content similarity between the generated image and the input content image at a perceptual level during the training (L1 loss can only compute pixel-level difference). It might be useful, even your dataset only has 48 categories. But to verify this requires an experiment (results with or without content discriminator)

For your second question: Do you mean it is necessary to check the content similarity during evaluate.sh? If so, I think it's necessary. For instance, during testing, you get one generated image x, and x = f(s, c) It is quite possible that x and s share the same style, but the content of x and c are not the same

manhvela commented 2 years ago

For your first question: The purpose of using a content discriminator is to check the content similarity between the generated image and the input content image at a perceptual level during the training (L1 loss can only compute pixel-level difference). It might be useful, even your dataset only has 48 categories. But to verify this requires an experiment (results with or without content discriminator)

For your second question: Do you mean it is necessary to check the content similarity during evaluate.sh? If so, I think it's necessary. For instance, during testing, you get one generated image x, and x = f(s, c) It is quite possible that x and s share the same style, but the content of x and c are not the same

Ah, I understand now. Thank you very much for all the answers, you're a great scientist.

Peace!

ligoudaner377 commented 2 years ago

Haha thanks for your comment. I'm just a student. Good luck with your project!