sparisi / cbet

Change-Based Exploration Transfer
Other
35 stars 5 forks source link

Pretrained Model #1

Open rothn opened 2 years ago

rothn commented 2 years ago

One of my favorite components of the C-BET paper was the proposed paradigm shift from tabula-rasa exploration for each task to a system where new environments are explored with the context carried over from a pretrained model. I've found that a practical starting point for similar procedures on other large models (e.g., BERTs, ResNets) is to obtain a copy of the pre-trained model. I'd love to start working with C-BET as well!

I'm very curious as to where I might be able to find the C-BET parameters from your paper. Looking forward to experimenting with this!

sparisi commented 2 years ago

Hi Nicholas! Happy that you liked CBET :) The parameters for training the policy/value network are all here and here. The ones defined in the slurm script override the default ones defined in the argument file.

Mind that we train one single network for both policy and value function, and we train both the feature layers (convolution layers) and the control layers (LSTM + linear layers). If you want to use models like BERT and ResNet, it would make sense to replace the convolution layers and train only the control layers.

rothn commented 2 years ago

@sparisi , thanks to getting back to me so quickly! I was referring to model parameters (e.g., weights and biases) in my earlier remark, so as to avoid re-training the agent myself from scratch. Do you provide these?

To your comment, I will be sure to reinitialize the appropriate parts of the network upon transfer, per your advice and suggestions in the paper, and take only the parameters I need!

sparisi commented 2 years ago

Oh I see. We do not set those parameters, we just let pytorch initialize them randomly according to its default initializer. All seeds are fixed before we initialize the models in order to reproduce the same results. We used seeds 1, 2, 3, ..., 7. We did not notice much deviation in performance across runs.

rothn commented 2 years ago

I do not have access to the same resources as you for pretraining, but would still love to try transferring a pre-trained control system to explore another environment -- I think that may be feasible on my system or a colab notebook. Is there any chance that you uploaded the model parameters somewhere (i.e., path/to/model.tar)?

sparisi commented 2 years ago

Models are available as code release. Let me know if you can run them!

rothn commented 2 years ago

Wow, thanks so much! I just requested access. Did you intend to make them public (must be set in sharing settings)?

sparisi commented 2 years ago

Yes, I wanted to make them public but it seems that with my account I cannot allow access just by sharing a link. For now I gave you access, I will find a better way for sharing the models later.

rothn commented 2 years ago

Great! I was able to reproduce some numbers from seed 1: Habitat Apartment 0 -> Hotel 0: visited_states = 2171.23 MiniGrid multi -> KeyCorridorS3R3: episodic win = 81.00

I'm curious as to why it's possible to visit a fraction of a state in Habitat, unless I'm missing something, which I probably am :-).

Thank you for making these models available to me -- I really appreciate it. Hopefully others will benefit as well-- I can see some fun, exciting transfer applications for the Habitat model in particular!

sparisi commented 2 years ago

Great! Those numbers seem the same as the plots in the paper, so it works :) If you got those numbers running the test script, the state count is averaged over 100 episodes. So if your visited states counts are [100, 105, 101, 100] (assuming 4 episodes) your final count will be 101.5.

Yes I think Habitat could transfer well to other environments, whereas MiniGrid is more limited.