krezelj / academia

Repository for a Bachelor's Thesis at Warsaw University of Technology that touches on the topic of Curriculum Learning.
MIT License
7 stars 0 forks source link

Future of the `curriculum` module #21

Open krezelj opened 1 year ago

krezelj commented 1 year ago

I want to outline the ideas and dreams I have for the curriculum module. Right now its features are very limited. We can only train a non-changing agent using a linear sequence of tasks that, because the agent is unchaging, have to be similar. Specifically the tasks must have the same goal and reward system for the agent to train properly. By an unchanging agent I do not mean an agent that is not updated but one that does not change the way it sees the world and takes actions (e.g. uses the same neural network architecture the whole time).

Below I list ideas I would like to be incorporated into the module. They are in order from the ones I think are easiest to implement/easiest to test to ones that I think are the hardest.

Imitation Learning

Article As far as I know this is not strictly connected to the idea of curriculum learning however I feel like it does have a place in our library at some point in the future. From what I understand imitation learning works by pre-training an agent on samples that were generated earlier and only then training it on its own.

For example we can generate samples by playing environments ourselves or by developing algorithms that play environments automatically and generate samples along the way (you might ask why would we train an agent if we have an algorithm. The hope is that we can train an agent to perform better and/or faster than our algorithm does).

I think for neural networks pretraining is usually done not by updating the agent with transitions (so rewards) but training the network to predict a one-hot encoded vector that says what action was played. In other words our samples for pre-training are pairs of state and action played and we treat the action played as a class and turn this process into a classification problem. We can then freeze first layers of the network (and possible randomise the remaining layers) and train the agent as usual.

We hope that to imitate the player (hence imitation learning) and can then explore the environment on its own and lear better strategies eventually reaching even better performance.

In the spirit of our library being named academia and having curriculum and task classes, this could be compared to the idea of a Lecture and be implemented similarly to the Task class. We could then design curricula that include both Lectures and Tasks in that order.

Transfer Learning

This idea is used by the previous one but we can do more with it. As I've said at the beginning right now all environments that the agent interacts with must have the same goal. However with transfer learning we could first train the agent to do one kind of tasks (e.g. survive as long as you can against enemies), then we could use transfer learning and keep first layers of a network and train the agent to do a different, more difficult kind of task that uses knowledge obtained previously (e.g. find a key and open the door WHILE avoiding enemies).

Agent vs Agent

Self play is considered a form of curriculum learning. This is because in self play the agent plays against a copy of itself. So at first, when the agent's performance is poor so is its opponent's. As the opponent improves it poses a bigger and bigger challenge for the agent. This implicitly orders experienced samples from "easy" ones to "hard" ones.

In general the clone that is the opponent is frozen and cannot be updated. We only replace it with a better clone once our agent is significantly superior to its opponent.

There are several other ideas we can explore when it comes to Agent vs Agent

Assymetric/Adverserial Play

In this setting the agents fight against each other but in a way that is assymetric. In other words they do not compete for the same goal. For example, one agent controls the hero in a game while the other agent controls the enemies or the environment. While the goal of the first agent is to achieve some pre-defined objective inside the game the goal of the second agent is to stop/delay the first agent as much as possible. Here we can let both agents learn simultaneously.

Cooperative Play

We can also let agents cooperate to achieve some common goal or to achieve their own goals that help other agents achieve theirs. Here I'm not sure if this can be useful for curriculum learning but I think it's interesting and worth investigating.

Teacher/Student

In the survery I remember the authors mentioning a teacher/student environment in which there is notion of a teacher that dynamically creates the curriculum for the agent. The teacher itself can be an agent that learns during the process or it could possibly be some kind of pre-determined algorithm.

Right now I have no idea how we can implement this but I think it's super interesting.

Skills

Article (not the article but one that I found a long while ago) I'm not entirely sure how this works but here's how I think it does: The agent is no longer responsible for every action. Instead we rely on skills or subagents that specialise in performing some tasks. In this setting our goal is to learn the skills and learn when to use which skill based on the current state of the environment.

This is definitely something beyond the scope of the thesis but might be worth considering if we every want to develop this library further.

Conclusion

While a lot of these ideas could be hard to implement I think we might be able to add imitation learning and possibly transfer learning/self-play if the time allows.

In all cases the curriculum module would have to be expanded by a quite a bit and one of the first things we would probably add that would help introduce these features would be task_callback that allows to interact with the curriculum inbetween the tasks.

maciejors commented 1 year ago

"Possibly add ability to form the curriculum as a dynamic directed graph (many parallel dependencies, adding new tasks automatically etc. probably very advanced)"

This was initially a comment above the Curriculum class declaration. I decided to move it here