Post-decoding process - Githubissues

athena913 commented 2 years ago

1)According to the paper, the vectorized output elements from the decoder are first rasterized and then composited. Is that because the differential compositer only accepts images as input? 2) Is it possible to compose the vectorized elements first and then rasterize the composited output, assuming there is a differential compositer that accepts vectorized elements as inputs? Would this make it easier to extract the SVG directly from the output of the compositer? What impact, if any, would this have on the final rasterized output?

thanks

preddy5 commented 2 years ago

Hey @athena913 1 - Yes we rasterize the individual paths first so that we can use the differential compositor, this is important since we saw that using traditional compositing didn't always get the network to converge well. We have an ablation in the supplementary to show network convergence with traditional composition and using differential compositing. 2 - I do understand what you exactly mean by composing vectorized elements, I think traditional vector graphics data structures that represent individual elements do not have the capability to do that, correct me if I am wrong. Our objective was to not extract SVG's from the output, instead, the decoder directly predicts SVG parameters and we train such a decoder without supervision from ground truth SVG data. We do this using differential rasterization and differential compositing functions as mentioned in the paper. So once trained you can use the output of the decoder to directly create an SVG file.

athena913 commented 2 years ago

Hi, Thanks for your response. I also have a question regarding generating variations of an image. I conducted some experiments by selecting a single emoji and tried to generate variations using the code below (x is the input image in the code below). I logged the points (all_points) generated in decode_and_composite(). During each call, a set of 4 points are generated, because decode_and_composite uses the number of colors to generate the points and there are 4 different colors. However, the same set of 4 points are generated during the 5 iterations. The 5 emojis that are generated are the same, but sometimes the color is filled-in and in some cases there is no fill-in color, as shown in the attached images below.

My understanding from the paper is that the decoder adaptively samples points from RNN output at each step and decodes it into a different curve. So during each of the 5 iterations, I assumed that there would be some randomness in the sampled points which would result in a slightly different set of 4 points and hence, slight variations in the decoded curves for the 5 iterations. But the only variation in the generated emojis seems to be in the fill-in color. However, in the paper, you have shown geometric variations for the generated fonts in Figure 10. Is it possible to generate similar structural variations for the emojis and icon datasets as well? Does this require any changes to the decode_and_composite code for different datasets? Also could you please provide a link to the supplemental paper you have referenced - it does not seem to be on the project or github page.

thanks

def genMultiple(self, x: Tensor, **kwargs) -> Tensor:

        mu, log_var = self.encode(x)
        #print(mu.shape, log_var.shape, mu)

        outputs = []
        for i in range(5):
          z = self.reparameterize(mu, log_var)  
          output = self.decode_and_composite(z, verbose=random.choice([True, False]))
          vutils.save_image(output.cpu().data,
                          f"{save_dir}{name}/version_{version}/"
                          f"{i}_recons.png",
                          normalize=False,
                          nrow=10)
        return

VectorVAEnLayers_1_recons VectorVAEnLayers_0_recons

preddy5 commented 2 years ago

Hey @athena913 Here's a link to the suppl https://openaccess.thecvf.com/content/CVPR2021/supplemental/Reddy_Im2Vec_Synthesizing_Vector_CVPR_2021_supplemental.zip this is the zip file we submitted for the review process.

We use the phrase sampling in two different contexts in the paper. One sampling refers to sampling a disk to decide the number of bezier curves used to represent the shape, Fig 4a shows outputs with a varing number of samples. The sentence you are mention uses samples in that context.

To generate like you intend you will have to train the network variationally, changing this parameter https://github.com/preddy5/Im2Vec/blob/master/models/vector_vae.py#L39 to true and training the network would do that. The network weights I share in the repo are not trained variationally so you can only interpolate between different latent vectors.

Regards, Pradyumna.

preddy5 / Im2Vec

Post-decoding process #15