liquidnode commented 4 years ago

I have tested the new encoder on CartPole-v1 and MiniWorld-Hallway-v0 (from gym_miniworld). The performance is significantly worse, where in CartPole-v1 for example the agent achieves only a slight improvement in 1000 episodes, in comparison to the old encoder, where the agent is able to achieve 500 timestep episodes almost every time in CartPole-v1 after 1000 episodes. The same worsening of performance can be seen in MiniWorld-Hallway-v0. Can you replicate these findings? (For MiniWorld-Hallway-v0 I used layerSizes=6*[Int3(6,6,32)])

I unfortunately can not state the exact reason for this worsening, since I don't fully understand the new encoder. About that, I also have a question not related to this issue: Would it be beneficial to introduce a decorrelation step into the encoders? This should ensure that different columns don't encode the same information which should lead to a higher quality encoding.

222464 commented 4 years ago

Hi,

It seems the new encoder is indeed a miss, in particular with RL environments. It works better than the old one on non-RL environments, I myself am not exactly sure why. However, for now I will likely just revert the encoder.

The encoders already perform decorrelation, in all variants. The pre-encoders do not necessarily do this, though. For instance, the ImageEncoder's columns receive no information about neighboring columns at all, and it relies on the spatial structure of the input (the image) to decorrelate it. The actual encoders (the ones in the hierarchy) perform decorrelation via reconstruction (or in some branches on the development fork, through lateral inhibitory connections).

liquidnode commented 4 years ago

I'm unsure if reconstruction alone is enough to decorrelate the output of the encoder. Let's say every hidden columns receptive field covers all input columns and there is a very dominant signal present in the input columns that is very easy to learn. This means every hidden column learns to just reconstruct this dominant signal and ignores the rest. When the output signal is explicitly decorrelated, only few hidden columns will learn to reconstruct this signal and the rest should reconstruct less dominant parts of the input.

222464 commented 4 years ago

It should work, it is kind of like a K-sparse autoencoder in this regard. Not all columns will learn the same pattern, since if one column already successfully reconstructs the pattern, the others will not receive any error and therefore will not learn it as well.

For a more explicit decorrelation mechanism, check out the OpenMP_TrueSC branch on 222464/OgmaNeo2. It doesn't reconstruct, but instead has lateral connections that learn whether columns are correlated, and then iteratively solves for a sparse code.

I just tested a different (older) encoder type using newer "early stopping" techniques, and I think it may actually be a good compromise between the predictive peformance of the current master and the RL performance of the commit before. I will push this one, I just ran it on a variety of tasks to test it, give it a try - it should be up in a few minutes.

222464 commented 4 years ago

31 Should address this issue. Give it a try, let me know if it works better for you! If it still doesn't work, I will revert to the older commit.

liquidnode commented 4 years ago

I have tested the new encoder on both enviroments. I can confirm that the performance improved significantly. So this issue can be closed if there are no objections from your side.

And thanks for clarifying the encoder for me. I thought the columns learn the input independently. The way you described it should indeed lead to decorrelation.

222464 commented 4 years ago

Alright, thanks for confirming that it works now again!

ogmacorp / OgmaNeo2

New encoder performs significantly worse in RL enviroments #30

31 Should address this issue. Give it a try, let me know if it works better for you! If it still doesn't work, I will revert to the older commit.