Even faster inference with torch 2

xinntao / Real-ESRGAN

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

BSD 3-Clause "New" or "Revised" License

27.64k stars 3.47k forks source link

Even faster inference with torch 2 #665

Open wacky6 opened 1 year ago

wacky6 commented 1 year ago

Not sure if this has been mentioned before.

I observe about 1.5-2x speed up using torch 2 (no tiling, no face enhancement, fp16), and a slight VRAM reduction on my NVIDIA A4000.

Setup	Time per frame
No compile, channel_first	1x
Torch 2, compile only	~1.5x
Torch 2, compile channels_last	~2x

A quick and dirty patch could be adding the following to Real-ESRGAN/realesrgan/utils.py, at the end of RealESRGANer.__init__

if torch.__version__[0] == '2':
    self.model = self.model.to(memory_format=torch.channels_last)
    self.model = torch.compile(self.model)

x4080 commented 1 year ago

Hi thanks for your tip, do you know how to make it work for mps ? It seems work but there's some error warning by using torch.compile, if I comment that line it seems work little bit faster (because there's no error warning)

wacky6 commented 1 year ago

I don't have a MPS capable device, so I don't have further comments here.

This thread suggests performance gain from changing layout varies by device and network architecture: https://github.com/pytorch/pytorch/issues/92542

My hope is as torch.compile improves, the gain will be more "visible" on other platforms.

x4080 commented 1 year ago

@wacky6 thanks

Kiran-valetcloset commented 3 months ago

Hi, @wacky6 , after I added this code at the end of init, the I'm not able to get any response. It just keeps on loading. My pytorch version is 2.0.1+cu118