yizhiwang96 / deepvecfont

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning
MIT License
182 stars 31 forks source link

An error occurs when running train_nr.py #11

Closed ssotobayashi-m closed 2 years ago

ssotobayashi-m commented 2 years ago

In train_nr.py#L35 you pass an argument named n_blocks, but neural_rasterizer.py shows that an argument with this name does not exist in NeuralRasterizer.__init__'s arguments. Should the value be passed to n_upsampling?

yizhiwang96 commented 2 years ago

In train_nr.py#L35 you pass an argument named n_blocks, but neural_rasterizer.py shows that an argument with this name does not exist in NeuralRasterizer.__init__'s arguments. Should the value be passed to n_upsampling?

Thanks for the feedback. The argment n_blocks should be deleted. See my new updates. The image resolution is fixed to 64 so n_blocks = n_upsampling=6.

ssotobayashi-m commented 2 years ago

I understand. Thanks for the explanation and the code fix! Problem resolved, I close this issue.

yizhiwang96 commented 2 years ago

I understand. Thanks for the explanation and the code fix! Problem resolved, I close this issue.

It has been months since I trained the neural rasterizer last time. When I rebuilt the environment and trained it hours ago, I encountered the error: WARNING:root:NaN or Inf found in input tensor. Have you encountered the same error? I'm trying to fix the problem.

yizhiwang96 commented 2 years ago

I understand. Thanks for the explanation and the code fix! Problem resolved, I close this issue.

It has been months since I trained the neural rasterizer last time. When I rebuilt the environment and trained it hours ago, I encountered the error: WARNING:root:NaN or Inf found in input tensor. Have you encountered the same error? I'm trying to fix the problem.

It is very interesting that: in pytorch 1.10, the error occurs; in pytorch 1.9.0, the error does not occur. I'm not sure why and investigating the reason (could be related to the setting of hyper-parameters).

ssotobayashi-m commented 2 years ago

I've been training the neural rasterizer for 10000 epochs on a data set that may not typically be what it is supposed to be, and it seems to have finished successfully in about 14 hours according to the logs. I'm currently using torch 1.10.0 and also plan to train with other datasets in this version, so I will report the results. If I get the error you mentioned, I will try 1.9.0.

yizhiwang96 commented 2 years ago

I've been training the neural rasterizer for 10000 epochs on a data set that may not typically be what it is supposed to be, and it seems to have finished successfully in about 14 hours according to the logs. I'm currently using torch 1.10.0 and also plan to train with other datasets in this version, so I will report the results. If I get the error you mentioned, I will try 1.9.0.

What kind of fonts are you doing experiment on? Through my own experience, for different kinds of dataset, you need to manually set the weight of different losses to make model easy to convergence. Try to modify the weights to make the numerics of different losses relatively in the same magnitude.

ssotobayashi-m commented 2 years ago

What kind of fonts are you doing experiment on?

The conclusion of the paper says "Moreover, how to extend our method to more challenging font synthesis tasks for other writing systems (e.g., Chinese) is also an interesting research direction", so I tried dvf with fonts containing Chinese characters. (currently using --max_seq_len 301)

Try to modify the weights to make the numerics of different losses relatively in the same magnitude.

I see. Thank you for your advice.

For Chinese characters I have increased the padding to 300 in create_example of svg_utils.py, but some characters have quite a lot of padding. So I think it might be a good idea to choose characters set that don't have too much padding. i.e., reducing the variance in the number of total segments (lines, curves) per character. It may also be important to choose a similar topological structure for the fonts so that the Bezier sequences do not change drastically, in order not to adversely affect the prediction of LSTM.

ssotobayashi-m commented 2 years ago

Try to modify the weights to make the numerics of different losses relatively in the same magnitude.

The default value of opts.l1_loss_w is 1, so there is no problem at the moment. However, considering the possibility of modifying the weight, the following may be more helpful. In fact, opts.l1_loss_w is applied when writing logs to TensorBoard.

@@ -90,7 +90,7 @@ def train_nr_model(opts):
             message = (
                 f"Epoch: {epoch}/{opts.n_epochs}, Batch: {idx}/{len(train_loader)}, "
                 f"Loss: {loss.item():.6f}, "
-                f"img_l1_loss: {rec_loss.item():.6f}, "
+                f"img_l1_loss: {opts.l1_loss_w * rec_loss.item():.6f}, "
                 f"img_cx_loss: {opts.cx_loss_w * vggcx_loss['cx_loss']:.6f}, "
             )
             logfile.write(message + '\n')
yizhiwang96 commented 2 years ago

What kind of fonts are you doing experiment on?

The conclusion of the paper says "Moreover, how to extend our method to more challenging font synthesis tasks for other writing systems (e.g., Chinese) is also an interesting research direction", so I tried dvf with fonts containing Chinese characters. (currently using --max_seq_len 301)

Try to modify the weights to make the numerics of different losses relatively in the same magnitude.

I see. Thank you for your advice.

For Chinese characters I have increased the padding to 300 in create_example of svg_utils.py, but some characters have quite a lot of padding. So I think it might be a good idea to choose characters set that don't have too much padding. i.e., reducing the variance in the number of total segments (lines, curves) per character. It may also be important to choose a similar topological structure for the fonts so that the Bezier sequences do not change drastically, in order not to adversely affect the prediction of LSTM.

The long sequences are big challenges for LSTM. Processing different paths in parallel could be helpful, just like DeepSVG.