a GAN idea - Githubissues

Thank you for the work. I recently start working on reinforcement learning of mathematical research (with the formal language and deduction system of a proof assistant as the environment); it's not straightforward to design a proper reward, but novelty is certainly a good measure of progress, and your work is inspiring.

One idea I have, which I also intend to apply in my project, is about the measurement of prediction error; it seems to me that some GAN idea is applicable here. The predictor can be seen as a generator, so how about training a discriminator (conditioned on the current state) with the predicted outcomes as negative samples and the actual outcomes as positive samples? Maybe then you can just predict the pixels, and the discriminator will extract features automatically and ignore any essentially unpredictable features, like the exact locations of tree leaves in a breeze. Also it would be unnecessary to distinguish between things that affect or can be controlled by the agent and things that do not.

I am a beginner in reinforcement learning apart from my participation in the Leela Zero project. I haven't looked much into the details of the various algorithms and NN architectures, and just want to get some feedback about whether the general idea is promising. Thank you in advance!

pathak22 / noreward-rl

a GAN idea #31