sign-language-processing / transcription

Text to pose model for sign language pose generation from a text sequence
35 stars 16 forks source link

text-to-pose: Predict length as distribution #1

Closed AmitMY closed 2 years ago

AmitMY commented 2 years ago

Length prediction is very hard and inconsistent. Instead of predicting a number, it makes sense to predict a distribution from which we could sample.

      # encode x to get the mu and variance parameters
        x_encoded = self.encoder(x)
        mu, log_var = self.fc_mu(x_encoded), self.fc_var(x_encoded)

        # sample z from q
        std = torch.exp(log_var / 2)
        q = torch.distributions.Normal(mu, std)
        z = q.rsample()

Then calculate the L1/L2 between z and the ground_truth, then propagating gradients... update weights...


In inference, just use mu, but if multiple signs, and some time limit is required (5 signs, 2 seconds), can optimally find the combination of what should be fast and slow based on the std

AmitMY commented 2 years ago

Done in https://github.com/sign-language-processing/transcription/commit/2dc6d404e40af310baecd2002dac6d62029ff057