Pre-Train General FPS Model

Description:

Train the implemented video masked autoencoder model on a dataset of general gameplay clips from first-person shooter (FPS) games.

Requirements:

Split the dataset into training, validation, and testing sets.
Train the video masked autoencoder model on the training set and validate its performance on the validation set.
Evaluate the final model's performance using appropriate metrics (e.g., reconstruction error, SSIM, etc.) on the testing set.

Acceptance Criteria:

Prepare a dataset of general FPS gameplay clips from the Waldo Vision database.
Preprocess the dataset appropriately, including resizing, normalization, and data augmentation if necessary.
Split the dataset into training, validation, and testing sets.
Train the video masked autoencoder model on the training set, achieving satisfactory performance as indicated by appropriate metrics.
Monitor training progress and adjust hyperparameters as needed to optimize model performance.
Evaluate the final model's performance using appropriate metrics (e.g., reconstruction error, SSIM, etc.) on the testing set.
Provide clear instructions on how to reproduce the training process and any customization options.

Notes:

Monitor training progress and be prepared to adjust hyperparameters, such as learning rate, batch size, or other factors, to optimize model performance.

Blocked by #5

waldo-vision / models

Pre-Train General FPS Model #6

Description:

Requirements:

Acceptance Criteria:

Notes: