openai / requests-for-research

A living collection of deep learning problems
https://openai.com/requests-for-research
1.69k stars 609 forks source link

Add Infinite Symbolic Generalization #4

Closed Kaixhin closed 8 years ago

Kaixhin commented 8 years ago

Context for this problem. Nando's reply is what led me to write the note on data efficiency.

ilyasu123 commented 8 years ago

The problem of learning algorithms that exhibit massive generalization from data is very important. Are you able to provide more clarity when defining the success criterion? Specifically, it can help to define a set of tasks, possibly ones that start easy and become harder, where such generalization can occur? It would be extra nice if the problem was natural rather than synthetic.

Kaixhin commented 8 years ago

Added a criterion and linked to the algorithmic environments in the gym. Couldn't think of a "natural" task that is as well defined as common algorithms, but did add a note about links to hierarchical reinforcement learning.

ilyasu123 commented 8 years ago

Can you identify a set of specific problems that you'd like to be solved? I.e., problems, inputs, outputs, etc. Specify the training lengths and the test lengths. Also, important to note that NPI is provided with the execution traces at training time. You need to decide if the model is allowed to be provided with this information during training time. The advantage is that you could train models to solve much harder problems, but the disadvantage is that there exist many problems for which we do not know what makes a good execution trace.

Why not formulate the goal as improving upon some of the existing models that have successfully generalized to much longer sequences already? Try to invent a few new tasks where you can reasonably expect such generalization to happen.

Kaixhin commented 8 years ago

Gone with variations on training/evaluating of the multi-task NPI as a set of specific baselines. Downgraded difficulty to 2 as execution traces will be made available. Did you have any other models in mind for this kind of algorithmic learning task? Do you think other "standard" CS algorithms (e.g. Dijkstra's) are feasible? Might be better to narrow the scope down from "Symbolic" to "Algorithmic" for now.

ilyasu123 commented 8 years ago

This is a much better question formulation.

The NPI is an excellent model and has been able to achieve great results on interesting tasks.

However, there is a way to make the research problem a lot more interesting: to train the NPI, the user must provide the NPI with execution traces. This is the reason for the NPI's strong performance, but it is also not a realistic requirement for almost all problems.

I think a better question would be along the following lines: Develop a model that's similar to the NPI that can learn using less detailed execution traces. Also show that it is much better to use reduced execution traces than to not use execution traces, where by better, performance on tasks is higher.

It's a very promising, but pretty difficult problem to attack. It is promising because it is much easier to provide high-level execution traces rather than low level execution traces. So I'd recommend formulating a problem of the sort:

Try to design interesting reduced execution traces for some of the tasks in NPI (or maybe the simpler tasks). Give it a try. I'll help with the providing guidance on how to approach this new problem.

Kaixhin commented 8 years ago

"Reduced execution traces" to me sounds like moving from a supervised to a reinforcement learning problem, and specifically the general area of hierarchical reinforcement learning - or is there an alternative interpretation? If it is then the problem can be formulated as being provided (having as input for the "target") each action and/or state to match (the execution trace), but having a reward instead of a supervised target. Using curriculum learning, the information from the "reward signal" can be reduced by only providing certain actions and/or states for the agent.

Outside of hierarchical RL, something closer to what I originally proposed would be to have interpretability of the "algorithm" learned by the agent. Currently the stack traces can be manually examined on small sequences to try and extricate the algorithm, but ideally there should be some way for the model to "explain" what general formula it has. Something akin to neurosymbolic research perhaps?

ilyasu123 commented 8 years ago

It could be reinforcement learning, but it could also be something else: it's up to the reader. But the point is, if you know the execution trace of a problem, then you also know how to solve it. Which means that it's not a very realistic scenario, and it makes sense to move away from it a little by giving partial, or high-level execution traces.

ilyasu123 commented 8 years ago

(FYI: Planning to close in a few days)

Kaixhin commented 8 years ago

Really sorry for the delay - I've now reposed the problem as using reduced execution traces, and made links to program induction.