Feature Request: Implement trainable probability vectors for mixture distributions

Let's say we have a mixture model of the form P(x) = w1 P1(x) + w2 P2(x) + ... where the wi-s add up to 1.

Right now,

We can create trainable distributions P1, P2, ..., e.g. using bijector-based networks.
We can combine them into a mixture easily using
```
mixture_dist = tfp.Mixture(
cat = tfd.Categorical(probs=[w1, w2,...]),
components = [P1, P2, ...]
)
```
and fit the model to data. Making the wi-s trainable, however, is bit complicated.

The way I got it to work was to create a custom model TrainableProbVector with one layer of trainable parameters. The model ignores its input and simply outputs the softmax of the parameters. But since keras models cannot not have inputs, it required some hacky coding to create a distribution that could both be trained, and be used like a regular distribution post-training.

Being able to create layers and/or models which don't have inputs will make this easier. A solution specific to tfp.Categorical and/or tfp.Mixture will also be great.

Thanks!

tensorflow / probability

Feature Request: Implement trainable probability vectors for mixture distributions #1257