irfanICMLL / colorization

reading note
3 stars 0 forks source link

rebuttal #9

Open irfanICMLL opened 7 years ago

irfanICMLL commented 7 years ago

Reviewer 1#: Q1:The paper missed a few relevant reference on sketch2photo and manga colorization. I recommend the paper discuss the connection and difference between the proposed method and these prior works. A1: Thanks a lot the reviewer’s recommendation of adding reference paper. I would like to add these discussions in my new vision paper in the section “related works”. "Manga colorization" is based on the texture-based segmentation. It can only filled the region with user’s input color. Meanwhile ‘Auto-painter’ is a generative model, which can learn the color scheme itself. It can both consider the user’s input and the original sketch. It can also paint the sketch without the user input. "Sketch2photo: Internet image montage" is different from our work mainly in the application field and method. It focus on the natural photos synthesis based on the image retrieval and segmentation while we can generate virtual cartoon character with vivid color based on the user’s arbitrary creation. "Generative visual manipulation on the natural image manifold" learns the manifold of natural images consider the user’s input instead of generating images. They already have a natural image as input while we just have a sketch.

Q2: If I understand it correctly, the current Table 1 is only for Minions. It would be great if the paper can also report the “like vs dislike’ for Japanese anime datasets. A2: To be honest, I have already finished the test on seven persons, but limited by the page length, I choose to show the evaluation results on Minions. I would like to add the “like vs dislike” test (auto-painter vs pix2pix) for Japanese anime datasets and employ more volunteers to do the test and show the results in the revised version. The elementary results of the seven person on the Japanese anime datasets are as follows: Like 1 2 3 4 5 6 7 Total Pix2pix 16 13 25 12 29 17 26 138 Auto-painter 40 43 31 44 27 39 30 254

Volunteers were asked to choose which one they prefer between two pictures (pix2pix vs auto-painter).

Q3: In Sec 3.1 “Network Structure” and Sec 5, the paper should credit ‘pix2pix’ [10] for introducing U-Net architecture for the image-to-image translation. This article should not describe it as a contribution. A3: I will reorganize my contribution in the revised version. The ambiguous expression in section 5 caused the reviewer’s misunderstanding. The main contribution of this work is that we focus on interesting applications and show high resolution (512*512) results. The model can also accept the user input, which is a kind of potential digit entertainment. Reviewer 2# A1: The paper is using some existing technique (U-net, pix2pix) with some improvement to tackle a narrower task, cartoon image generation. It is good since the methods and the experiments are valid. Q1: Thanks for the reviewer’s positive feedback and comments. But the weak points are: 1) It is unclear if the methods can improve the image generation in broader applications. 2) the technical contribution is not much. It is very helpful if the authors can apply their modified methods on other applications, such as changing day images into night ones, etc.

irfanICMLL commented 7 years ago

BMVC2017 The British Machine Vision Conference September 4-7, 2017, London, UK

Reviews For Paper 

Paper ID 947 Title Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks


Masked Reviewer ID: Assigned_Reviewer_37 Review:
Question
Paper Summary. Please summarize the paper in your own words. This paper proposes auto-painter, an end-to-end deep learning system, for automatic colorization. The system arguments the previous ‘pix2pix’ model [10] by introducing additional two terms (1) a TV loss to avoid the color mutation, and (2) a feature loss to measure the difference in VGG feature space. This evaluation experiments show that the proposed system outperforms pix2pix model. The paper also includes a comprehensive ablation study, indicating the advantage of the full method, compared to pix2pix+TV, and pix2pix+feature loss. Paper Strengths. Describe positive aspects of the paper. What is of interest to the community? Consider criteria such as novelty, clarity, technical achievements (theoretical or practical), and thoroughness of validation. If accepted, what aspects would be cite-worthy? - The paper is well-written and easy to follow.

Scribble-based cartoon colorization: Qu, Yingge, Tien-Tsin Wong, and Pheng-Ann Heng. "Manga colorization." ACM Transactions on Graphics (TOG). Vol. 25. No. 3. ACM, 2006.

Non-parametric sketch2photo: Chen, Tao, et al. "Sketch2photo: Internet image montage." ACM Transactions on Graphics (TOG) 28.5 (2009): 124.

GAN-based sketch2photo: Zhu, Jun-Yan, et al. "Generative visual manipulation on the natural image manifold." European Conference on Computer Vision. Springer International Publishing, 2016.

A user-study has been performed to evaluate the algorithm in comparison the three baselines. Further, many different examples of generated images are shown in the paper.

Controlling the color with color block is useful for a more manual colorization by an artist or for recoloring existing cartoons. Paper Weaknesses. Discuss negative aspects such as lack of novelty or clarity, technical errors, missing validation or proofs, etc.. Please justify your comments with details. If you think the paper is not novel, explain why and provide specific references. Be kind. I am not sure about the choice of loss functions here. The data-term L_p is using an L1 loss while the TV loss is using an L2 loss. I would find the opposite choice more natural. A L1 TV regularizer would allow for color discontinuities, which are actually desired in a cartoon setting, instead of the smoothness of L2. The opposite effect applies for L_p.

The number of images in the user study seems to be 40. How where these images chosen? In my opinion, they should have been selected randomly from the test set, to allow a fair comparison between the different algorithms. I cannot find any information about this in the paper. Also the qualitative samples in the figures should be randomly selected and not cherry picked to allow the reader to get a better understanding of the image quality since no objective quantitative evaluation is possible.

In terms of contribution, this paper combines pix2pix [10] with two losses described in [11]. There is no comparison to other work except the baseline pix2pix although there are several (colorization) techniques already out there. For example [9,10,21].

Why was the depth of the features in L_f chosen to be 4. It would be interesting to see the results for different j. Also, I am not sure why here the metric is L2 and for L_p the metric is L1. If j was 0 then the images are compared directly. In this case it would be natural to assume L_p = L_f which is not the case.

Minor comments:

irfanICMLL commented 7 years ago

Reviewer 1#: Q1:The paper missed a few relevant reference on sketch2photo and manga colorization. I recommend the paper discuss the connection and difference between the proposed method and these prior works. A1: Thanks a lot the reviewer’s recommendation of adding reference paper. I would like to add these discussions in my new vision paper in the section “related works”. "Manga colorization" is based on the texture-based segmentation. It can only filled the region with user’s input color. Meanwhile ‘Auto-painter’ is a generative model, which can learn the color scheme itself. It can both consider the user’s input and the original sketch. It can also paint the sketch without the user input. "Sketch2photo: Internet image montage" is different from our work mainly in the application field and method. It focus on the natural photos synthesis based on the image retrieval and segmentation while we can generate virtual cartoon character with vivid color based on the user’s arbitrary creation. "Generative visual manipulation on the natural image manifold" learns the manifold of natural images consider the user’s input instead of generating images. They already have a natural image as input while we just have a sketch. Q2: If I understand it correctly, the current Table 1 is only for Minions. It would be great if the paper can also report the “like vs dislike’ for Japanese anime datasets. A2: To be honest, I have already finished the test on seven persons, but limited by the page length, I choose to show the evaluation results on Minions. I would like to add the “like vs dislike” test (auto-painter vs pix2pix) for Japanese anime datasets and employ more volunteers to do the test and show the results in the revised version. The elementary results of the seven person on the Japanese anime datasets are as follows: Like 1 2 3 4 5 6 7 Total Pix2pix 16 13 25 12 29 17 26 138 Auto-painter 40 43 31 44 27 39 30 254 Volunteers were asked to choose which one they prefer between two pictures (pix2pix vs auto-painter). Q3: In Sec 3.1 “Network Structure” and Sec 5, the paper should credit ‘pix2pix’ [10] for introducing U-Net architecture for the image-to-image translation. This article should not describe it as a contribution. A3: I will reorganize my contribution in the revised version. The ambiguous expression in section 5 caused the reviewer’s misunderstanding. The main contribution of this work is that we focus on interesting applications and show high resolution (512512) results. The model can also accept the user input, which is a kind of potential digit entertainment. Reviewer 2# Q1: The paper is using some existing technique (U-net, pix2pix) with some improvement to tackle a narrower task, cartoon image generation. It is good since the methods and the experiments are valid. A1: Thanks for the reviewer’s positive feedback and comments. Q2:But the weak points are: 1) It is unclear if the methods can improve the image generation in broader applications. 2) the technical contribution is not much. It is very helpful if the authors can apply their modified methods on other applications, such as changing day images into night ones, etc. A2: The BMVC is calling for papers covering theory and/or application. Our main contribution is using the U-net structure in this interesting application—cartoon image generation form the sketch. Taking different loss term into consideration and adjusting the network to adapt the sparse sketch input are meaningful to figure out this problem. What’s more our resolution is 512512. To our knowledge, it is much higher than other works. We also make interactive demo and will release the pre-train model for different cartoon types for researchers who are interesting in the auto-painter.

irfanICMLL commented 7 years ago

To R1: Thanks a lot for the reviewer’s recommendation of references. We will include these references in our revised version. The connections and differences to our work will also be included as the following. (1) Manga colorization is based on texture segmentation to divide an image into a few regions, it fills the regions with indicated colors by users. This is similar to Auto-painter that allows user interaction. However, as a generative model, Auto-painter can learn the color scheme from training images automatically. (2) Sketch2phot uses a simple freehand sketch annotated with text label to generated a realistic image. The text label (usually an object) is used to retrieve similar objects from the Internet, and the generated image is a seamlessly stitched from searched photos in agreement with the given sketch. Auto-painter learns from sketch without any predefined text labels, and we can also generate cartoon character based on the user’s arbitrary creation. (3) GAN-based sketch2photo learns the manifold of natural images consider the user’s input instead of generating images. Users are allowed to amend the input pictures, it is not about learning color painting of a given sketch. (4) About ‘like-dislike’ experiment: we have actually finished the test on Jap Anime but failed to include the results in the manuscript due to the page limit. We will add the results in the revised version as following: Like        1  2  3 4  5  6 7  Total Pix2pix     16 13 25 12 29 17 26  138 Auto-painter 40 43 31 44 27 39 30  254 (5) I will reorganize our contributions in the revised version. The ambiguous expression in section 5 caused the reviewer’s misunderstanding. The main contribution of this work is that we proposed a GAN-based solution for high resolution sketch painting generation. It has the potential to be a successful application in digit entertainment like Google’s Auto-draw.

irfanICMLL commented 7 years ago

Q1: The number of images in the user study seems to be 40. How where these images chosen? A1: I just randomly sample from the test set. I choose the number 40 to make sure every test can be finished within 10 minutes. Otherwise the boring test can not enable that the volunteers are concentrate on the test. I can provide more results and the demo link or the video in the revise vision. Q2: Further, comparing to other existing approaches [9, 10, 21] would be desired, especially since the method uses a non-public, non-benchmark dataset. A2: [9] focus on transferring the gray scale picture into real photos, which is different from the sketch. Because the gray scale picture carried much more information. We already make compare test with [10]. [21] didn’t release their pre-train model or code, so we haven’t compared to it. We will release our dataset, pre-train model and code for others to compare with our method. Q3: Many arbitrary choices were made L2 vs. L1, j = 4, w_f = 0.01, w_tv = 0.0001 which are not evaluated or motivated. A3: As our experiments show that the L2 in Lf and Ltv performs better than L1 with the same parameters setting. Because that Lf and Ltv are more high-level features. As [10] shows that L2 in Lp will caused more blurry, so we choose L1 follow their work. We also train the model when j=3, but the results are not as good as j=4. The parameter choosing is always a big problem for deep learning task. In this paper, we make contrast test with different loss term, but we don’t go through all the parameters combination of the loss weight. We don’t focus on choosing the best w_f or w_tv, we just show that with a group of reasonable parameters, the loss term can improve the results. Q4: Minor comments A4: Thanks for the suggestion and apologize for our careless. We will correct the them.

irfanICMLL commented 7 years ago

To R1: Thanks a lot for the reviewer’s recommendation of references. We will include these references in our revised version. The connections and differences to our work will also be included as the following. (1) Manga colorization is based on texture segmentation to divide an image into a few regions, it fills the regions with indicated colors by users. This is similar to Auto-painter that allows user interaction. However, as a generative model, Auto-painter can learn the color scheme from training images automatically. (2) Sketch2phot uses a simple freehand sketch annotated with text label to generated a realistic image. The text label (usually an object) is used to retrieve similar objects from the Internet, and the generated image is a seamlessly stitched from searched photos in agreement with the given sketch. Auto-painter learns from sketch without any predefined text labels, and we can also generate cartoon character based on the user’s arbitrary creation. (3) GAN-based sketch2photo learns the manifold of natural images consider the user’s input instead of generating images. Users are allowed to amend the input pictures, it is not about learning color painting of a given sketch. (4) About ‘like-dislike’ experiment: we have actually finished the test on Jap Anime but failed to include the results in the manuscript due to the page limit. We will add the results in the revised version as following: Like 1 2 3 4 5 6 7 Total Pix2pix 16 13 25 12 29 17 26 138 Auto-painter 40 43 31 44 27 39 30 254 (5) I will reorganize our contributions in the revised version. The ambiguous expression in section 5 caused the reviewer’s misunderstanding. The main contribution of this work is that we proposed a GAN-based solution for high resolution sketch painting generation. It has the potential to be a successful application in digit entertainment like Google’s Auto-draw. R2: Thanks for the positive feedback. As for the contribution, the BMVC is calling for papers covering theory and/or application. Our main contribution is using the U-net structure in this interesting application—cartoon image generation form the sketch. Taking different loss term into consideration and adjusting the network to adapt the sparse sketch input are meaningful to figure out this problem. What’s more our resolution is 512*512. To our knowledge, it is much higher than other works. We also make interactive demo and will release the pre-train model for different cartoon types for researchers who are interesting in the auto-painter. R3 : 1)result choose: I just randomly sample from the test set. I choose the number 40 to make sure every test can be finished within 10 minutes. Otherwise the boring test can not enable that the volunteers are concentrate on the test. I can provide more results and the demo link or the video in the revise vision.2) Compare to other method: [9] focus on transferring the gray scale picture into real photos, which is different from the sketch. Because the gray scale picture carried much more information. We already make compare test with [10]. [21] didn’t release their pre-train model or code, so we haven’t compared to it. We will release our dataset, pre-train model and code for others to compare with our method.3) For arbitrary choices: As our experiments show that the L2 in Lf and Ltv performs better than L1 with the same parameters setting. Because that Lf and Ltv are more high-level features. As [10] shows that L2 in Lp will caused more blurry, so we choose L1 follow their work. We also train the model when j=3, but the results are not as good as j=4. The parameter choosing is always a big problem for deep learning task. In this paper, we make contrast test with different loss term, but we don’t go through all the parameters combination of the loss weight. We don’t focus on choosing the best w_f or w_tv, we just show that with a group of reasonable parameters, the loss term can improve the results. 4) Minor comments: Thanks for the suggestion and apologize for our careless. We will correct the them.

irfanICMLL commented 7 years ago

To Reviewer 3:   (1)   In answering the difficulty of evaluation because of the user preference and subjective perception. We carefully designed the experiment, 40 images are randomly chosen to make sure that every test can be finished within 10 minutes. Otherwise the test becomes boring and the volunteers cannot concentrate on the test. We will provide more results and the link of online demo in the revised vision.   (2)   Reference [9] discussed how to transfer the grayscale photos to color photos, it is different from the sketch painting in following aspects: photos actually carry much more information (shapes, texture) than sketches. Most end-to-end image transformation research are similar, but they cannot be directly compared to our research as they are dealing with different problems. We already make comparison study with reference [10]. We indeed hope to compare our model to [21], however, we are not able to obtain their pre-train model or released code. For our research, we will release our dataset, pre-train model and code for others to compare with our method if get accepted.

irfanICMLL commented 7 years ago

To Reviewer 3:   (1)   In answering the difficulty of evaluation because of the user preference and subjective perception. We carefully designed the experiment, the model is trained on 1000 images and 120 images for test. The generated test images will be available online upon the acceptance of our paper. The reason we choose 40 random images from 120 test images is to make sure that every test can be finished within 10 minutes. Otherwise the test becomes boring and the volunteers cannot concentrate on the test. We will provide more results and the link of online demo in the revised vision.   (2)   Reference [9] discussed how to transfer the grayscale photos to color photos, it is different from the sketch painting in following aspects: photos actually carry much more information (shapes, texture) than sketches. Most end-to-end image transformation research are similar, but they cannot be directly compared to our research as they are dealing with different problems. We already make comparison study with reference [10]. We indeed hope to compare our model to [21], however, we are not able to obtain their pre-train model or released code. For our research, we will release our dataset, pre-train model and code for others to compare to our method if get accepted.     (3)   As our experiments show that the L2 in Lf and Ltv performs better than L1 with the same parameters setting. Because that Lf and Ltv are more high-level features. Reference [10] also shows that L2 in Lp will caused more blurry, so we choose L1 cost following their work. We have tried different values of j and j=3 and j=4 gives the best results. The parameter tuning is still a universal problem in deep learning research. Reasonable parameter setting is based educated guesses or trail study, we agree with the reviewer that parameter setting are not well evaluated, but we can hardly see any papers come up good theory to explain how their parameters are set in similar research.

irfanICMLL commented 7 years ago

To R1:(1 )Thanks a lot for the reviewer’s recommendation of references. We will include these references in our revised version. (2) About ‘like-dislike’ experiment: we have actually finished the test on Jap Anime but failed to include the results in the manuscript due to the page limit. We will add the results in the revised version. (3) I will reorganize our contributions in the revised version. The main contribution of this work is that we proposed a GAN-based solution for high resolution sketch painting generation. It has the potential to be a successful application in digit entertainment like Google’s Auto-draw. To R2: 1) Thanks for the positive feedback. Although pix2pix in an existing technique, based on our acquaintance, it is the first attempt to use the U-net as a solution for high resolution (512*512) sketch painting generation. Taking different loss term into consideration and adjusting the network to adapt the sparse sketch input are meaningful to figure out this problem. We also make interactive demo and will release the pre-train model for researchers who are interesting in the auto-painter.2) As for other application, the distribution of the data may be changed. We can discuss in other articles. To R3: (1) Random results. We carefully designed the experiment: the model is trained on 1000 images and 120 images for test. The reason we choose 40 random images from 120 test images is to make sure that every test can be finished within 10 minutes. Otherwise the volunteers cannot concentrate on the test. (2)[9] discussed how to transfer the gray scale photos to color photos, it is different from the sketch painting for that photos actually carry much more information (shapes, texture) than sketches. Most end-to-end image transformation researches are similar, but they cannot be directly compared to our research as they are dealing with different problems. We already make comparison study with [10]. We indeed hope to compare our model to [21], however, we are not able to obtain their pre-train model or released code. For our research, we will release our dataset, pre-train model, online demo and code for others to compare to our method if get accepted. (3) As our experiments show that the L2 in Lf and Ltv performs better than L1 with the same parameters setting. Because that Lf and Ltv are more high-level features. Meanwhile, [10] shows that L2 in Lp will caused more blurry, so we choose L1 cost following their work. We have tried different values of j and j=3 and j=4 gives the best results. The parameter tuning is still a universal problem in deep learning research. Reasonable parameter setting is based educated guesses or trail study, we agree with the reviewer that parameter setting are not well evaluated, but we can hardly see any papers come up good theory to explain how their parameters are set in similar research. 4) Minor comments: Thanks for the suggestion and apologize for our carelessness. We will correct them.

The ambiguous expression in section 5 caused the reviewer’s misunderstanding. The connections and differences to our work will also be included as the following. (1) Manga colorization is based on texture segmentation to divide an image into a few regions, it fills the regions with indicated colors by users. This is similar to Auto-painter that allows user interaction. However, as a generative model, Auto-painter can learn the color scheme from training images automatically. (2) Sketch2phot uses a simple freehand sketch annotated with text label to generated a realistic image. The text label (usually an object) is used to retrieve similar objects from the Internet, and the generated image is a seamlessly stitched from searched photos in agreement with the given sketch. Auto-painter learns from sketch without any predefined text labels, and we can also generate cartoon character based on the user’s arbitrary creation. (3) GAN-based sketch2photo learns the manifold of natural images consider the user’s input instead of generating images. Users are allowed to amend the input pictures, it is not about learning color painting of a given sketch. The generated test images and the link of online demo will be available online upon the acceptance of our paper. the test becomes boring and as following: Like 1 2 3 4 5 6 7 Total Pix2pix 16 13 25 12 29 17 26 138 Auto-painter 40 43 31 44 27 39 30 254

irfanICMLL commented 7 years ago

删减后字数正好的rebuttal: To R1 : (1 )Thanks a lot for the reviewer’s recommendation of references. We will include these references in our revised version. (2) About ‘like-dislike’ experiment: we have actually finished the test on Jap Anime but failed to include the results in the manuscript due to the page limit. We will add the results in the revised version. (3) I will reorganize our contributions in the revised version. The main contribution of this work is that we proposed a GAN-based solution for high resolution sketch painting generation. It has the potential to be a successful application in digit entertainment like Google’s Auto-draw. To R2: 1) Thanks for the positive feedback. Although pix2pix in an existing technique, based on our acquaintance, it is the first attempt to use the U-net as a solution for high resolution (512*512) sketch painting generation. Taking different loss term into consideration and adjusting the network to adapt the sparse sketch input are meaningful to figure out this problem. We also make interactive demo and will release the pre-train model for researchers who are interesting in the auto-painter.2) As for other application, the distribution of the data may be changed. We can discuss in other articles. To R3: (1) Random results. We carefully designed the experiment: the model is trained on 1000 images and 120 images for test. The reason we choose 40 random images from 120 test images is to make sure that every test can be finished within 10 minutes. Otherwise the volunteers cannot concentrate on the test. (2)[9] discussed how to transfer the gray scale photos to color photos, it is different from the sketch painting for that photos actually carry much more information (shapes, texture) than sketches. Most end-to-end image transformation researches are similar, but they cannot be directly compared to our research as they are dealing with different problems. We already make comparison study with [10]. We indeed hope to compare our model to [21], however, we are not able to obtain their pre-train model or released code. For our research, we will release our dataset, pre-train model, online demo and code for others to compare to our method if get accepted. (3) As our experiments show that the L2 in Lf and Ltv performs better than L1 with the same parameters setting. Because that Lf and Ltv are more high-level features. Meanwhile, [10] shows that L2 in Lp will caused more blurry, so we choose L1 cost following their work. We have tried different values of j and j=3 and j=4 gives the best results. The parameter tuning is still a universal problem in deep learning research. Reasonable parameter setting is based educated guesses or trail study, we agree with the reviewer that parameter setting are not well evaluated, but we can hardly see any papers come up good theory to explain how their parameters are set in similar research. 4) Minor comments: Thanks for the suggestion and apologize for our carelessness. We will correct them. 给AC的信,最多1024字符,现在960了,估计还能说一句话: Dear AC’s: The BMVC is calling for papers covering theory and/or application. We proposed a GAN-based solution for high resolution sketch painting generation. It has the potential to be a successful application in digit entertainment like Google’s Auto-draw. It is an interesting work and we want to share with others, including the code, pre train model, training set and online demo. I think this may help us to build a more intelligence auto-painter. We discuss about the effect of different loss term and make user studies, which makes our results to be more valid. Although, I haven’t gone through all the parameter settings, the results are obviously impressive. I feel sorry that I missed the deadline to submit the supplementary material. Limited by the length of the page, we are unable to show more details. But we can provide more results if the supplementary materials can be submit again after acceptance. Yifan Liu, Zengchang Qin, Zhenbo Luo and Hua Wang

没地方写进去的: It is different from previous sketch2photo or Manga colorization as reviewer 1# mentioned. The connections and differences to our work will also be included as the following. (1) Manga colorization is based on texture segmentation to divide an image into a few regions, it fills the regions with indicated colors by users. This is similar to Auto-painter that allows user interaction. However, as a generative model, Auto-painter can learn the color scheme from training images automatically. (2) Sketch2phot uses a simple freehand sketch annotated with text label to generated a realistic image. The text label (usually an object) is used to retrieve similar objects from the Internet, and the generated image is a seamlessly stitched from searched photos in agreement with the given sketch. Auto-painter learns from sketch without any predefined text labels, and we can also generate cartoon character based on the user’s arbitrary creation. (3) GAN-based sketch2photo learns the manifold of natural images consider the user’s input instead of generating images. Users are allowed to amend the input pictures, it is not about learning color painting of a given sketch. The generated test images and the link of online demo will be available online upon the acceptance of our paper. the test becomes boring and as following: Like 1 2 3 4 5 6 7 Total Pix2pix 16 13 25 12 29 17 26 138 Auto-painter 40 43 31 44 27 39 30 254