Closed leesharkey closed 2 years ago
Demoting the urgency of this because I think it'll likely take too long to fix.
Basically the optimization is just extremely unstable. It's kind of working for most except IC directions.
My main hypothesis as to why:
There are multiple discrete variables in the latents (the discrete categorical vars in the RSSM and also the action space). This means that slight changes in the bottleneck vector can lead to very different samples.
Potential future solution:
We're not going to do target functions. Dataset examples will suffice