thomashopkins32 / Minecraft-Virtual-Intelligence

MIT License
0 stars 0 forks source link

PPO + environment setup #1

Closed thomashopkins32 closed 4 months ago

thomashopkins32 commented 6 months ago

This PR adds PPO as a new learning algorithm.

It also refactors the inputs and outputs of many modules to be more general and applicable to PPO.

Full details will be documented once this PR is finished.

thomashopkins32 commented 4 months ago

Since this PR is getting relatively large, I think I should split up the work into smaller chunks. Coverage on unit tests are good and we have solid ground to build upon.

This is a good starting foundation for the first set of experiments that I would like to try.

We will add curiosity based learning in future work that should be able to utilize the PPO formulation built here.