text-to-pose: Predict length as distribution

Length prediction is very hard and inconsistent. Instead of predicting a number, it makes sense to predict a distribution from which we could sample.

      # encode x to get the mu and variance parameters
        x_encoded = self.encoder(x)
        mu, log_var = self.fc_mu(x_encoded), self.fc_var(x_encoded)

        # sample z from q
        std = torch.exp(log_var / 2)
        q = torch.distributions.Normal(mu, std)
        z = q.rsample()

Then calculate the L1/L2 between z and the ground_truth, then propagating gradients... update weights...

In inference, just use mu, but if multiple signs, and some time limit is required (5 signs, 2 seconds), can optimally find the combination of what should be fast and slow based on the std

sign-language-processing / transcription

text-to-pose: Predict length as distribution #1