train_model.py doesn't specify any weight initialization (not sure what SB3 does by default, is it random?).
Hypothesis
If we can initialize the weights with some layers from a pretrained computer vision model, this might speed up training and/or avoid early local convergence. Specifically, the convolutional layers seem most useful for feature extraction.
Task
Write a new version of train_model.py that initializes new models with weights from the convolutional layers of a computer vision model like Resnet 50.
Challenges
This probably requires implementing a custom policy, which requires understanding some of the internals of the training algorithm. From previous experiments, we know that DQN works on the current environment so the ideal first experiment for this task is to write a custom policy for DQN. But it might be easier to write a custom policy for another training algorithm.
train_model.py doesn't specify any weight initialization (not sure what SB3 does by default, is it random?).
Hypothesis If we can initialize the weights with some layers from a pretrained computer vision model, this might speed up training and/or avoid early local convergence. Specifically, the convolutional layers seem most useful for feature extraction.
Task Write a new version of train_model.py that initializes new models with weights from the convolutional layers of a computer vision model like Resnet 50.
Challenges This probably requires implementing a custom policy, which requires understanding some of the internals of the training algorithm. From previous experiments, we know that DQN works on the current environment so the ideal first experiment for this task is to write a custom policy for DQN. But it might be easier to write a custom policy for another training algorithm.