microsoft / NUWA

A unified 3D Transformer Pipeline for visual synthesis
2.81k stars 163 forks source link

Missing performance numbers in the paper #13

Open Mut1nyJD opened 2 years ago

Mut1nyJD commented 2 years ago

First off congratulation to this amazing work. I think you managed to find the closing gap to make generative Deep learning relevant for real-world application, besides being just a nice toy as previous work in this area.

However to truly judge the performance of your approach I have to say I was a bit disappointed after reading your paper there was not a single note on execution time for either training or more crucial actually sampling of a single final image.

Would you be able to provide some numbers on how long a sample generation takes for a 4kx1k images with 256^2 patch size and on which setup?

Also if possible could you also shed some light on training times and which setup was used.

Thank you!