tkipf / c-swm

Contrastive Learning of Structured World Models
MIT License
387 stars 67 forks source link

What do you use for baseline implementation? #6

Open balloch opened 3 years ago

balloch commented 3 years ago

Do you all use unique code for baseline implementations of World Models (with AE and VAE) or do you use a publicly accessible library? Can you point me toward the library/code you used? Would be useful for duplicating results. (@abaheti95 you'll be interested too)

tkipf commented 3 years ago

Thank you for your question! We use our own implementation for this baseline, following the architecture as described in Appendix D.5 of our paper. Essentially, the CNN encoder architecture is the same as in the C-SWM model with the difference that the last CNN layer produces 32 feature maps (instead of num_objects) and that we flatten this representation over width and height, i.e. into a single [width*height*32] feature vector (as opposed to [num_objects, width*height] feature vectors for the C-SWM model). We then apply the same Encoder MLP as in C-SWM to this flattened representation to arrive at the final 32-dim embedding of the image (32-dim mean + 32-dim variance vector for a VAE). As for the decoders, please see: https://github.com/tkipf/c-swm/blob/master/modules.py