pathak22 / noreward-rl

[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning
Other
1.41k stars 301 forks source link

a GAN idea #31

Open alreadydone opened 5 years ago

alreadydone commented 5 years ago

Thank you for the work. I recently start working on reinforcement learning of mathematical research (with the formal language and deduction system of a proof assistant as the environment); it's not straightforward to design a proper reward, but novelty is certainly a good measure of progress, and your work is inspiring.

One idea I have, which I also intend to apply in my project, is about the measurement of prediction error; it seems to me that some GAN idea is applicable here. The predictor can be seen as a generator, so how about training a discriminator (conditioned on the current state) with the predicted outcomes as negative samples and the actual outcomes as positive samples? Maybe then you can just predict the pixels, and the discriminator will extract features automatically and ignore any essentially unpredictable features, like the exact locations of tree leaves in a breeze. Also it would be unnecessary to distinguish between things that affect or can be controlled by the agent and things that do not.

I am a beginner in reinforcement learning apart from my participation in the Leela Zero project. I haven't looked much into the details of the various algorithms and NN architectures, and just want to get some feedback about whether the general idea is promising. Thank you in advance!

AdarshMJ commented 5 years ago

My initial thoughts were the same. I read few papers which outline the similarities between RL algorithms and GAN, like for example - https://arxiv.org/pdf/1610.01945.pdf Im not sure whether we can augment GAN with RL algorithms or would it just complicate the whole stuff