[WIP] Learning to solve the Cartpole-v1 environment

ntoxeg commented 3 years ago

This is my attempt at making ROCCA able to solve (i. e. achieve average total reward of at least 195 over 100 trials) the Cartpole-v1 Gym environment.

Current Challenges

ROCCA is unable to come up with predictions regarding the use of observations and their numerical properties, like if $angle < -0.01 then Go Left.
- From inspecting Miner logs it seems it only comes up with “blanket statement” predictions like
```
PredictiveImplicationScope
 …
 AltSequentialAnd
     …
     And
         Predicate “$A"
             Variable “$X”
         Predicate “$B"
             Variable “$Y”
         Execution
             Schema “Go Left”
     Execution
         Schema “Go Left
 Evaluation
     Predicate “Reward”
     Number “1"
```
- After inspecting PLN’s newly written rule for PredictiveImplicationScope introduction it also won’t be able to come up with such a solution, simply because the rule expects cognitive schema contexts to be Predicates, while what is needed is for the system to come up with an expression the uses something like GreaterThan.
- That can be easily bypassed though with DefinedPredicates.
Even if I seed the agent with the already written schemas, the system then favours the newly created ones that are clearly worse.
- One clear issue is that the only rewards considered are hardcoded “1” or “0”. The system doesn’t have the capacity to actually compare what rewards it got and properly credit schemas that earned more of those.
- The current way of representing rewards is too simplistic - for most tasks you need an ability to gauge how well was the task done, not just if it was done at all.
- Perhaps this could be worked around by defining a goal of “Reward Improved” which would be a binary goal of whether the environment reward has been improved (reached higher value than the current best) or not.

A more general challenge is to figure out how the system should discover things like “I should check if the angle of the pole is smaller than some number”, so generally come up with predicates and calculations to be applied on observations.

ngeiswei commented 3 years ago

That looks very cool, @ntoxeg, note that I likely won't be able to look into it (as well as merging your PR #18) before a couple weeks. Then I should have more time for it (we'll probably want to schedule a call or something).

ntoxeg commented 3 years ago

I understand @ngeiswei, I will keep working on it and we can certainly schedule a call when you have more time to discuss this.

ntoxeg commented 3 years ago

Rebased against master.

Some summary of current changes:

Introduced flags enabling multiprocessing for the Miner and PLN, they automatically use the number of your CPU cores from now on.
The setup.py is adapted from nbdev template - it’s more complete and loads dependencies from requirements.txt so you don’t have to write them in setup.py anymore (it doesn’t require nbdev to run).
Introduced Tensorboard support via tensorboardX, accumulated rewards for MineRL and learning cart pole agents are now reported under runs/<datetime><comment>.
For now there are docs generated via nbdev (under docs), I might just remove them because they are not very useful currently - they should be hosted (via Github Pages, for example), otherwise they just lie there.
The dev container setup is just some files under .devcontainer, it shouldn’t be a problem to have them and they can make it easier for other people to run the code.
Last but not least, the agent itself - it tries to solve the Cartpole-v1 and is currently seeded with the initial schemas from the “static” agent already present. Sadly, I still haven’t figured out a solution to the problems I wrote about in the introduction.

ntoxeg commented 3 years ago

Ah, I also should mention that I’ve been working on using the atomspace-rocks to have snapshots of the AtomSpace saved under snapshots directory - I don’t use that for anything yet but it could prove useful in the future for observing how the knowledge base evolves over time.

ngeiswei commented 3 years ago

Note that multithreading support is very prototypical at this point. It shouldn't crash reasoning but it's not gonna bring much performance gain.

ntoxeg commented 2 years ago

Closing because of #32, no point in holding this open indefinitely.

opencog / rocca

[WIP] Learning to solve the Cartpole-v1 environment #19

Current Challenges