ogroth / tf-gqn

Tensorflow implementation of Neural Scene Representation and Rendering
Apache License 2.0
188 stars 35 forks source link

GQN trained on CLEVR dataset #27

Open loganbruns opened 5 years ago

loganbruns commented 5 years ago

Thanks for the GQN implementation. I thought you might enjoy seeing some pictures of how it does trained on a different dataset. (Albeit with a limited amount of training time. I plan to train for longer.)

Screen Shot 2019-06-11 at 6 04 24 AM

Even on the test set it works pretty well with a relatively small amount of training. Seems to generalize better than on the flat shaded deepmind dataset.

image

I'm curious what kind of changes you might be interested in via pull request? I have some changes to the training parameters and also I've found self-attention to improve the speed of generalization and training in general. However that wasn't in the original paper.

Thanks, logan

waiyc commented 5 years ago

Hi logan, Your results looks great. May I know what is the dataset size and how long do you train the model?

Chan

loganbruns commented 5 years ago

@waiyc, approximately 100k iterations on ~15k training examples. Not as long as I'd have liked nor with as much data as I'd liked. I'm thinking of generating more data and retraining. Maybe at the size of the original CLEVR dataset which was significantly larger. (Waiting on some more disks.)

ogroth commented 5 years ago

@loganbruns That looks great, thank you for sharing these results! :) I'd be very happy to include a data loader for the CLEVR dataset (either from raw files or from pre-processed tfrecords). I'm currently in the middle of updating the data loader to a more stable and tf 1.12.1 compatible version. The update should be online within the week. So feel free to send a pull request for a CLEVR data loader. It should live under data_provider/clevr_provider.py and be modelled after the updated gqn_provider.py I'm also very interesed in (self-) attention mechanisms for the model since they were used in follow-up papers like the localization and mapping one. I'm happy to discuss this on a separate issue thread.

loganbruns commented 5 years ago

@ogroth, thanks for the reference. I'll read it. I also created a separate issue to discuss perhaps merging some of the changes. Regarding CLEVR, since I had to modify the dataset generation I also added to the dataset generation changes code to convert it into the deepmind dataset format. I was thinking of asking them if they'd take some of the changes so others could use their generator to generate for GQNs. That is what I was thinking at least.

phongnhhn92 commented 5 years ago

@loganbruns would you mind sharing the conversion code that you have used to convert CLEVR dataset to GQN dataset tfrecords format, I am also creating my own dataset and still struggling to understand the GQN dataset format to make it work with this implementation.

ogroth commented 5 years ago

Hi @loganbruns , the new input pipeline is now in master. Would you mind modelling your input_fn for CLEVR after this one? Also, you can include data generation and conversion code for CLEVR under data_provider. I'm happy to review your pull request. :)

loganbruns commented 5 years ago

@phongnhhn92, here is the source:

https://github.com/loganbruns/clevr-dataset-gen/blob/clevr_gqn/image_generation/convert_gqn.py

Just let me know if you have any questions.

loganbruns commented 5 years ago

@ogroth , thanks. I'll take a look.

waiyc commented 5 years ago

@loganbruns From your convert_gqn.py I can see that you saved each scene with N number of frames as one TFrecord. As you mentioned you trained the model with 15k training example, so you generated 15k scenes/.tfrecord as training data.

Is my understanding correct?

loganbruns commented 5 years ago

@waiyc , yes, 15k scenes each with N number of frames. I generated a file per train, val, and test. The train tfrecord file had 15k scenes. For the deepmind dataset each tfrecord file has 5k scenes.