an lstm autoencoder could be used for predicting the next step in the sequence.
this would render clustering redundant, as given predicted features, a single closest matching frame could be found using a k-d tree, similarly to audio mosaicking.
https://machinelearningmastery.com/lstm-autoencoders/
would it solve the general messiness of the model output experienced currently?
worth trying
an lstm autoencoder could be used for predicting the next step in the sequence. this would render clustering redundant, as given predicted features, a single closest matching frame could be found using a k-d tree, similarly to audio mosaicking. https://machinelearningmastery.com/lstm-autoencoders/
would it solve the general messiness of the model output experienced currently? worth trying