microsoft / StyleSwin

[CVPR 2022] StyleSwin: Transformer-based GAN for High-resolution Image Generation
https://arxiv.org/abs/2112.10762
MIT License
503 stars 49 forks source link

is that able to run on cpu? #5

Closed lucasjinreal closed 2 years ago

lucasjinreal commented 2 years ago

is that able to run on cpu?

ForeverFancy commented 2 years ago

Yes. For inference, just try to set all the devices to cpu and set args.distributed to False, and then the model is able to run on cpu device. We do not recommend using cpu for training due to the unacceptable speed.

lucasjinreal commented 2 years ago

@ForeverFancy I just want inference, but I saw there are some ops using cuda kernel, does it effect?

ForeverFancy commented 2 years ago

When detecting the device of input is cpu, the ops would not use the cuda kernel, but run the cpu code instead. See fused_act.py#L93 and upfirdn2d.py#L147. Besides, you could also remove the cuda kernel part and reserve cpu part to ensure the whole code is runnable on cpu device.

lucasjinreal commented 2 years ago

@ForeverFancy thank u, that's very helpful. Did u test styles win speed compares with stylegan on same device and same input resolution? which one is faster?

ForeverFancy commented 2 years ago

We have tested StyleSwin and StyleGAN2 on 1024 resolution. The FLOPs of StyleGAN2 and StyleSwin are 74.27B and 50.90B respectively, and the throughput of StyleGAN2 and StyleSwin are 40.05 imgs/sec and 11.05 imgs/sec respectively on V100 GPU. And we think the gap between the theoretical FLOPs and the throughput in practice is mainly because that the vision transformers have not been sufficiently optimized as ConvNets, and we believe future optimization will democratize the usage of transformers as they exhibit lower theoretical FLOPs.