Seeking Guidance for Reproducing SketchKnitter Results on QuickDraw Airplane Class - Githubissues

wangqiang9 / SketchKnitter

About PyTorch implementation of SketchKnitter: Vectorized Sketch Generation with Diffusion Models, ICLR 2023, Spotlight.

MIT License

57 stars 9 forks source link

Seeking Guidance for Reproducing SketchKnitter Results on QuickDraw Airplane Class #6

Closed Bohan7 closed 1 year ago

Bohan7 commented 1 year ago

Thank you for sharing your compelling and meticulously constructed work. I attempted to reproduce the results of SketchKnitter using only the airplane class from the QuickDraw dataset. However, the quality of the generated airplane sketches did not align with my expectations, and the quantitative results were not as favorable as I had anticipated. I employed the code and dataset from your GitHub repository, executing the following commands to train SketchKnitter:

python train.py --data_dir [/path/to/datasets] \
                --lr 1e-4 \
                --batch_size 512 \
                --use_fp16 False \
                --log_dir [/path/to/log] \
                --diffusion_steps 1000 \
                --noise_schedule linear \
                --image_size 96 \
                --num_channels 96 \
                --num_res_blocks 3

I also sampled the sketches and evaluated the quality using your code. The results are as follows:

Inception Score: 2.460366725921631
FID: 61.48920809233371
sFID: 49.6528623255079
Precision: 0.5416
Recall: 0.37

I suspect there may be some issues with my approach. However, I haven't made any modifications to your original code. Could you please provide any guidance or suggestions on potential adjustments I could make to improve the performance? I appreciate your time and help in this matter. Have a wonderful day, and I look forward to your response.

wangqiang9 commented 1 year ago

Hello, thank you for your interest in our work. What is the division of your dataset training and testing sets? What is your training epoch? What is the number of images selected for the quantitative experiment? Please let me know these settings for my judgment.

wangqiang9 commented 1 year ago

I have some suggestions, as you only train on a single category, it seems like you need to readjust the parameters. You should have mastered the visualization methods I provided. You can adjust parameters during the training process by visualizing the generated results. Some parameters that have a significant and sensitive impact on the generated results are diffusion_steps ，num_res_blocks、batch_size，class_cond，lr，pen_break et al.

wangqiang9 commented 1 year ago

In addition, quantitative results need to be evaluated with as many sketches as possible, with a recommended value of over 100000 to better reflect the distribution pattern of the data.

wangqiang9 commented 1 year ago

Another key point is that if your dataset is simple, you should appropriately reduce image_size, such as a smaller empirical value. In the future, I will open source a script code to help better calculate the average length and select more suitable truncation values.

Bohan7 commented 1 year ago

Thank you for your suggestions! I used the airplane sketches from QuickDraw Dataset. In this dataset, 75K samples (70K Training, 2.5K Validation, 2.5K Test) have been randomly selected from each category. I selected 2.5K images for the quantitative experiment.

I have the following questions. Firstly, I noticed in your code, the output dimension for the pen state is 2. Shouldn't it be 1? Secondly, for tuning the parameters, could you please provide your recommended values? Finally, there are totally 75K samples in each class of the QuickDraw Dataset. I think I cannot evaluate 100,000 generated sketches.

wangqiang9 commented 1 year ago

Thank you for your feedback. Firstly, the pen_state is a probability value, and I set it to 0.1 by default in the code. Please refer to our paper for detailed principles in this section. Of course, this value is closely related to the dataset. My experience is that this value is inversely proportional to the complexity of the data you train. Secondly, if testing multiple categories, the total data volume is generally much greater than 100k, so you can easily select the data you need to test. Finally, we welcome you to share more experiences. The experience of the open source community in sketch generation is far less than that of images. We hope that more people can share their interesting experiences.

Bohan7 commented 1 year ago

Thank you for your reply. Could you please upload your pretrained models?

Bohan7 commented 1 year ago

If it's difficult for you to upload your pretrained models, could you possibly provide me with the specific hyperparameters — such as num_res_blocks, class_cond, and lr — that were used to train the 'Moderate' model, as mentioned in Table 1 of your paper? I would greatly appreciate your assistance.

wangqiang9 commented 1 year ago

In the future, we have plans to release unconditional models, conditional models, and some useful script tools for estimating pen numbers for sketches. Please continue to pay attention to our work.