wangqiang9 / SketchKnitter

About PyTorch implementation of SketchKnitter: Vectorized Sketch Generation with Diffusion Models, ICLR 2023, Spotlight.
MIT License
58 stars 9 forks source link

Some questions about model train and sample #10

Closed chaoshuoZhang closed 1 year ago

chaoshuoZhang commented 1 year ago

num_res_blocks=4,num_heads=8,my loss is around 0.08 and is no longer decreasing,and I'm wondering if I can stop training,but FID is 30,I used Apple in QuickDraw dataset,What can I do to lower my FID?

| grad_norm | 0.114 | | loss | 0.0757 | | loss_q0 | 0.227 | | loss_q1 | 0.0749 | | loss_q2 | 0.0142 | | loss_q3 | 0.00181 | | mse | 0.0747 | | mse_q0 | 0.226 | | mse_q1 | 0.074 | | mse_q2 | 0.0132 | | mse_q3 | 0.000843 | | pen_state | 0.0965 | | pen_state_q0 | 0.0964 | | pen_state_q1 | 0.0966 | | pen_state_q2 | 0.0966 | | pen_state_q3 | 0.0965 |

wangqiang9 commented 1 year ago

First of all, thank you for your attention to our work! I would like you to confirm a few questions: 1. Your detailed training parameters, especially those related to epochs and pen breaks. 2. Your detailed reasoning parameters. 3. Is pixel format or sequence format used when calculating FID? Loss=0.08 does not mean that your model has reached its optimal state, you are mistaken about this.

chaoshuoZhang commented 1 year ago

First of all, thank you for your attention to our work! I would like you to confirm a few questions: 1. Your detailed training parameters, especially those related to epochs and pen breaks. 2. Your detailed reasoning parameters. 3. Is pixel format or sequence format used when calculating FID? Loss=0.08 does not mean that your model has reached its optimal state, you are mistaken about this. I trained for about 340 epochs and I used the model at 200 epochs and the final model, and found that the final model did not grow. I used pixels.

chaoshuoZhang commented 1 year ago

Inception Score: 1.1382626295089722 FID: 21.036152697323196 sFID: 38.442634954765836 Precision: 0.1212 Recall: 0.834 I calculated it using SEQ

wangqiang9 commented 1 year ago

So, why did your FID improve this time? And How many result images did you use for measurement? This is crucial for the outcome. I am happy to help you improve your model and share my experience. I hope you can provide more information.

chaoshuoZhang commented 1 year ago

Thanks for your guidance,I converted it into a photo of 256256 before, just now I tried to convert it into a 5656 picture to calculate the fid, and then calculated the fid in the form of sqeuence, I found that the FID of both is about 20, I sampled 2500 images , I think 2500 images can reflect the quality of the model generation to some extent, when a convincing data is needed, we then sample a large amount of data

chaoshuoZhang commented 1 year ago

And I want to know how much epoch your model trained in your paper

wangqiang9 commented 1 year ago

Thank you for the information you provided. My point of view is that firstly, the indicators in our paper are calculated by dividing them into simple categories, complex categories, medium categories, etc. Each category contains different categories. Therefore, attempting to understand a single apple as corresponding to the numerical values in the paper is incorrect. Secondly, we suggest including more result images. In my opinion, 2500 will be far less than the optimal indicator, and individual category indicators will generally be below 10. In addition, other parameters such as pen status and inference steps will have an impact on the optimal indicator.

wangqiang9 commented 1 year ago

And you don't seem to have answered why your FID will increase by 30%?

chaoshuoZhang commented 1 year ago

I originally calculated in pixel form, but then I tried using sequence and found that the result improved by 30%. Listening to your point of view, I decided to try to sample more data, but I don't think the results will be much better.

chaoshuoZhang commented 1 year ago

Inception Score: 1.2403502464294434 FID: 1.5497880671044157 sFID: 3.6766540745975504 Precision: 0.804 Recall: 0.846 Is this a normal result for a single dataset?

wangqiang9 commented 1 year ago

Inception Score: 1.2403502464294434 FID: 1.5497880671044157 sFID: 3.6766540745975504 Precision: 0.804 Recall: 0.846 Is this a normal result for a single dataset?

This FID result seems to have significantly improved. Can you share your debugging experience from the past few days with future followers?

wangqiang9 commented 1 year ago

Looking forward to sharing your experimental experience, I will thank you for your sharing in README.md.

chaoshuoZhang commented 1 year ago

I want to confirm how the FID in your paper is calculated, is it through OpenAI's code? Is npz a sketch sequence or pixel ?

wangqiang9 commented 1 year ago

Yes, our code based on openai, as mentioned in the acknowledgment.

chaoshuoZhang commented 1 year ago

thanks, can you explain if you are calculating directly from the sequence generated by the model, or calculating the FID in pixel after the model generates the data visualization?