hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.12k stars 724 forks source link

[question] Why are RL CNNs so shallow? #367

Open AlanKuurstra opened 5 years ago

AlanKuurstra commented 5 years ago

It seems that RL CNNs are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

araffin commented 5 years ago

Hello,

are much more shallow than the ones used on imagenet? Am I right about this? And why would that be the case?

That's a good question, and you are right in most cases. I think a simple answer would be that they are complex enough to solve the tasks.

To my knowledge, the most complex (and successful) CNN Policy architecture is the one from IMPALA, where some residual connections are used. The way RL works makes it also tricky to use with batch-norm, which usually allow the use of deeper net.

Then, a lot of RL problems do not use images as input (e.g. Mujoco/Pybullet envs, where the input is the joints angles), in that case there is no need to have more complex architecture.

Finally, you can always try to use deeper net, but by experience, this does not often result in better perfomances.

SmileLab-technion commented 4 years ago

Hello,

as i see it: In image recognition the algorithm needs to recognize the image label. This is done by projecting the image to some latent space where the pictures are separable. In RL the image just represent the state which is why only few features of the pictures is needed. your confusion comes from your view on how humans make choices which is not the same as RL.(look on this video)