Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
:page_facing_up: Read Paper \ :mega: Blog Post \ :space_invader: MineRL Environment (note version 1.0+ required) \ :checkered_flag: MineRL BASALT Competition
Install pre-requirements for MineRL. Then install requirements with:
pip install git+https://github.com/minerllabs/minerl
pip install -r requirements.txt
⚠️ Note: For reproducibility reasons, the PyTorch version is pinned as
torch==1.9.0
, which is incompatible with Python 3.10 or higher versions. If you are using Python 3.10 or higher, install a newer version of PyTorch (usually,pip install torch
). However, note that this might subtly change model behaviour (e.g., still act mostly as expected, but not reaching the reported performance).
To run the code, call
python run_agent.py --model [path to .model file] --weights [path to .weight file]
After loading up, you should see a window of the agent playing Minecraft.
Below are the model files and weights files for various pre-trained Minecraft models. The 1x, 2x and 3x model files correspond to their respective model weights width.
These models are trained on video demonstrations of humans playing Minecraft using behavioral cloning (BC) and are more general than later models which use reinforcement learning (RL) to further optimize the policy. Foundational models are trained across all videos in a single training run while house and early game models refine their respective size foundational model further using either the housebuilding contractor data or early game video sub-set. See the paper linked above for more details.
These models further refine the above demonstration based models with a reward function targeted at obtaining diamond pickaxes. While less general then the behavioral cloning models, these models have the benefit of interacting with the environment using a reward function and excel at progressing through the tech tree quickly. See the paper for more information on how they were trained and the exact reward schedule.
IDM aims to predict what actions player is taking in a video recording.
Setup:
pip install -r requirements.txt
To run the model with above files placed in the root directory of this code:
python run_inverse_dynamics_model.py --weights 4x_idm.weights --model 4x_idm.model --video-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.mp4 --jsonl-path cheeky-cornflower-setter-02e496ce4abb-20220421-092639.jsonl
A window should pop up which shows the video frame-by-frame, showing the predicted and true (recorded) actions side-by-side on the left.
Note that run_inverse_dynamics_model.py
is designed to be a demo of the IDM, not code to put it into practice.
Disclaimer: This code is a rough demonstration only and not an exact recreation of what original VPT paper did (but it contains some preprocessing steps you want to be aware of)! As such, do not expect replicate the original experiments with this code. This code has been designed to be run-able on consumer hardware (e.g., 8GB of VRAM).
Setup:
pip install -r requirements.txt
.weights
and .model
file for model you want to fine-tune..mp4
and .jsonl
files to the same directory (e.g., data
). With default settings, you need at least 12 recordings.If you downloaded the "1x Width" models and placed some data under data
directory, you can perform finetuning with
python behavioural_cloning.py --data-dir data --in-model foundation-model-1x.model --in-weights foundation-model-1x.weights --out-weights finetuned-1x.weights
You can then use finetuned-1x.weights
when running the agent. You can change the training settings at the top of behavioural_cloning.py
.
Major limitations:
Over the course of the project we requested various demonstrations from contractors which we release as index files below. In general, major recorder versions change for a new prompt or recording feature while bug-fixes were represented as minor version changes. However, some recorder versions we asked contractors to change their username when recording particular modalities. Also, as contractors internally ask questions, clarification from one contractor may result in a behavioral change in the other contractor. It is intractable to share every contractor's view for each version, but we've shared the prompts and major clarifications for each recorder version where the task changed significantly.
The following is a list of the available versions:
6.x Core recorder features subject to change :arrow_down: index file
6.9 First feature complete recorder version
6.10 Fixes mouse scaling on Mac when gui is open
6.11 Tracks the hotbar slot
6.13 Sprinting, swap-hands, ... (see commits below)
improve replays that are cut in the middle of gui; working on riding boats / replays cut in the middle of a run
improve replays by adding dwheel action etc, also, loosen up replay tolerances
opencv version bump
add swap hands, and recording of the step timestamp
implement replaying from running and sprinting and tests
do not record sprinting (can use stats for that)
check for mouse button number, ignore >2
handle the errors when mouse / keyboard are recorded as null
7.x Prompt changes :arrow_down: index file
Right now, early game data is especially valuable to us. As such, we request that at least half of the data you upload is from the first 30 minutes of the game. This means that, for every hour of gameplay you spend in an older world, we ask you to play two sessions in which you create a new world and play for 30 minutes. You can play for longer in these worlds, but only the first 30 minutes counts as early game data.
8.x :clipboard: House Building from Scratch Task :arrow_down: index
9.x :clipboard: House Building from Random Starting Materials Task :arrow_down: index
10.0 :clipboard: Obtain Diamond Pickaxe Task :arrow_down: index
Sometimes we asked the contractors to signify other tasks besides changing the version. This primarily occurred in versions 6 and 7 as 8, 9 and 10 are all task specific.
We restrict the contractors to playing Minecraft in windowed mode at 720p which we downsample at 20hz to 360p to minimize space. We also disabled the options screen to prevent the contractor from changing things such as brightness, or rendering options. We ask contractors not to press keys such as f3 which shows a debug overlay, however some contractors may still do this.
Demonstrations are broken up into up to 5 minute segments consisting of a series of compressed screen observations, actions, environment statistics, and a checkpoint save file from the start of the segment. Each relative path in the index will have all the files for that given segment, however if a file was dropped while uploading, the corresponding relative path is not included in the index therefore there may be missing chunks from otherwise continuous demonstrations.
Index files are provided for each version as a json file:
{
"basedir": "https://openaipublic.blob.core.windows.net/data/",
"relpaths": [
"8.0/cheeky-cornflower-setter-74ae6c2eae2e-20220315-122354",
...
]
}
Relative paths follow the following format:
<recorder-version>/<contractor-alias>-<session-id>-<date>-<time>
Note that due to network errors, some segments may be missing from otherwise continuous demonstrations.
Your data loader can then find following files:
<basedir>/<relpath>.mp4
<basedir>/<relpath>.jsonl
<basedir>/<relpath>-options.json
<basedir>/<relpath>.zip
The action file is not a valid json object: each line in action file is an individual action dictionary.
For v7.x, the actions are in form
{
"mouse": {
"x": 274.0,
"y": 338.0,
"dx": 0.0,
"dy": 0.0,
"scaledX": -366.0,
"scaledY": -22.0,
"dwheel": 0.0,
"buttons": [],
"newButtons": []
},
"keyboard": {
"keys": [
"key.keyboard.a",
"key.keyboard.s"
],
"newKeys": [],
"chars": ""
},
"isGuiOpen": false,
"isGuiInventory": false,
"hotbar": 4,
"yaw": -112.35006,
"pitch": 8.099996,
"xpos": 841.364694513396,
"ypos": 63.0,
"zpos": 24.956354839537802,
"tick": 0,
"milli": 1649575088006,
"inventory": [
{
"type": "oak_door",
"quantity": 3
},
{
"type": "oak_planks",
"quantity": 59
},
{
"type": "stone_pickaxe",
"quantity": 1
},
{
"type": "oak_planks",
"quantity": 64
}
],
"serverTick": 6001,
"serverTickDurationMs": 36.3466,
"stats": {
"minecraft.custom:minecraft.jump": 4,
"minecraft.custom:minecraft.time_since_rest": 5999,
"minecraft.custom:minecraft.play_one_minute": 5999,
"minecraft.custom:minecraft.time_since_death": 5999,
"minecraft.custom:minecraft.walk_one_cm": 7554,
"minecraft.use_item:minecraft.oak_planks": 5,
"minecraft.custom:minecraft.fall_one_cm": 269,
"minecraft.use_item:minecraft.glass_pane": 3
}
}
We also collected a dataset of demonstrations for the MineRL BASALT 2022 competition, with around 150GB of data per task.
Note: To avoid confusion with the competition rules, the action files (.jsonl) have been stripped of information that is not allowed in the competition. We will upload unmodified dataset after the competition ends.
FindCave :arrow_down: index file
Look around for a cave. When you are inside one, quit the game by opening main menu and pressing "Save and Quit To Title".
You are not allowed to dig down from the surface to find a cave.
Timelimit: 3 minutes.
Example recordings: https://www.youtube.com/watch?v=TclP_ozH-eg
MakeWaterfall :arrow_down: index file
After spawning in a mountainous area with a water bucket and various tools, build a beautiful waterfall and then reposition yourself to “take a scenic picture” of the same waterfall, and then quit the game by opening the menu and selecting "Save and Quit to Title"
Timelimit: 5 minutes.
Example recordings: https://youtu.be/NONcbS85NLA
MakeVillageAnimalPen :arrow_down: index file
After spawning in a village, build an animal pen next to one of the houses in a village. Use your fence posts to build one animal pen that contains at least two of the same animal. (You are only allowed to pen chickens, cows, pigs, sheep or rabbits.) There should be at least one gate that allows players to enter and exit easily. The animal pen should not contain more than one type of animal. (You may kill any extra types of animals that accidentally got into the pen.) Don’t harm the village.
After you are done, quit the game by opening the menu and pressing "Save and Quit to Title".
You may need to terraform the area around a house to build a pen. When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers. Animal pens must have a single type of animal: pigs, cows, sheep, chicken or rabbits.
The food items can be used to lure in the animals: if you hold seeds in your hand, this attracts nearby chickens to you, for example.
Timelimit: 5 minutes.
Example recordings: https://youtu.be/SLO7sep7BO8
BuildVillageHouse :arrow_down: index file
Taking advantage of the items in your inventory, build a new house in the style of the village (random biome), in an appropriate location (e.g. next to the path through the village), without harming the village in the process.
Then give a brief tour of the house (i.e. spin around slowly such that all of the walls and the roof are visible).
* You start with a stone pickaxe and a stone axe, and various building blocks. It’s okay to break items that you misplaced (e.g. use the stone pickaxe to break cobblestone blocks).
* You are allowed to craft new blocks.
Please spend less than ten minutes constructing your house.
You don’t need to copy another house in the village exactly (in fact, we’re more interested in having slight deviations, while keeping the same "style"). You may need to terraform the area to make space for a new house.
When we say not to harm the village, examples include taking animals from existing pens, damaging existing houses or farms, and attacking villagers.
After you are done, quit the game by opening the menu and pressing "Save and Quit to Title".
Timelimit: 12 minutes.
Example recordings: https://youtu.be/WeVqQN96V_g
This was a large effort by a dedicated team at OpenAI: Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune The code here represents a minimal version of our model code which was prepared by Anssi Kanervisto and others so that these models could be used as part of the MineRL BASALT competition.