Could you please provide the workflow of generating the coreML Model for RealESRGAN?

Vaida12345 commented 1 year ago

Hi: Thank you for the great work! Can you please provide the code for converting the RealESRGAN model to CoreML Model?

I tried to build by myself but apparently the output is wrong.

import coremltools as ct
import torch
from basicsr.archs.rrdbnet_arch import RRDBNet
import coremltools.proto.FeatureTypes_pb2 as ft

example_input = torch.rand(1, 3, 256, 256)
example_output = torch.rand(1, 3, 1024, 1024)
model_path = "/Users/vaida/Downloads/Safari download/RealESRGAN_x4plus.pth"

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
loadnet = torch.load(model_path, map_location=torch.device('cpu'))
# prefer to use params_ema
if 'params_ema' in loadnet:
    keyname = 'params_ema'
else:
    keyname = 'params'
model.load_state_dict(loadnet[keyname], strict=True)

traced_model = torch.jit.trace(model, example_input)

image_input = ct.ImageType(name="input", shape=example_input.shape)
image_output = ct.ImageType(name="output", shape=example_output.shape)

mlmodel = ct.convert(
    traced_model,
    source = "pytorch",
    inputs = [image_input]
)

spec = mlmodel.get_spec()

ct.utils.rename_feature(spec, "var_4053", "image")
output = spec.description.output[0]
output.name = "image"

output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
output.type.imageType.width = 1024
output.type.imageType.height = 1024

mlmodel = ct.models.MLModel(spec)
mlmodel.save("newmodel.mlpackage")

hholtmann commented 1 year ago

I've run into the same issue. Any idea how to get this fixed?

hholtmann commented 1 year ago

Made some progress on this. For the output you need an activation layer for normalisation as mentioned here: https://github.com/john-rocky/CoreML-Models/issues/6 I am however still struggling with the input (may need also normalisation)

Vaida12345 commented 1 year ago

To anyone in the future: I have found the answer.

In one file:

import coremltools as ct
import torch
from basicsr.archs.rrdbnet_arch import RRDBNet
import coremltools.proto.FeatureTypes_pb2 as ft

example_input = torch.rand(1, 3, 256, 256)
example_output = torch.rand(1, 3, 1024, 1024)
model_path = "/Users/vaida/Downloads/Safari download/RealESRGAN_x4plus.pth"

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
loadnet = torch.load(model_path, map_location=torch.device('cpu'))
# prefer to use params_ema
if 'params_ema' in loadnet:
    keyname = 'params_ema'
else:
    keyname = 'params'
model.load_state_dict(loadnet[keyname], strict=True)

traced_model = torch.jit.trace(model, example_input)

image_input = ct.ImageType(name="input", shape=example_input.shape, scale=0.003921568859368563)
image_output = ct.ImageType(name="output", shape=example_output.shape)

mlmodel = ct.convert(
    traced_model,
    source = "pytorch",
    inputs = [image_input]
)

mlmodel.save("original.mlpackage")

This file would create the translated model from PyTorch to CoreML.

Then, in another file, to make the output as an image

import coremltools as ct
import torch
import coremltools.proto.FeatureTypes_pb2 as ft

mlmodel = ct.models.MLModel("original.mlpackage")

spec = mlmodel.get_spec()
builder = ct.models.neural_network.NeuralNetworkBuilder(spec=spec)

builder.add_squeeze(name="squeeze", input_name="var_4053", output_name="squeeze_out", axes = None, squeeze_all = True)
builder.add_activation(name="activation",non_linearity="LINEAR",input_name="squeeze_out",output_name="image",params=[255, 0])
builder.spec.description.output.pop()
builder.spec.description.output.add()

output = builder.spec.description.output[0]
output.name = "image"

output.type.imageType.colorSpace = ft.ImageFeatureType.ColorSpace.Value('RGB')
output.type.imageType.width = 1024
output.type.imageType.height = 1024

mlmodel = ct.models.MLModel(spec)
mlmodel.save("newmodel.mlpackage")

roimulia2 commented 1 year ago

Hey Vaida! Have you managed to convert other SR models besides this one? It works pretty slow in physical devices

Vaida12345 commented 1 year ago

No I haven’t. I suppose these models are slow for they are complicated. If you open these models in Netron, for example, they are much more complex than Waifu2x. Hence I think I would continue using the c++ implementation.

roimulia2 commented 1 year ago

@Vaida12345 Does the c++ implementation for this work faster then MLModel and CoreML?

Vaida12345 commented 1 year ago

CoreML should be fastest anyway, as it is highly optimized by Apple, and it uses both GPU, CPU and Apple neural engine, while the c++ implementation uses only GPU. However, by testing the models on a physical machine, I found them nearly equally fast. Considering you need to deal with the alpha channels and post processing yourself, I would recommend the C++ implementation.

john-rocky / CoreML-Models

Could you please provide the workflow of generating the coreML Model for RealESRGAN? #20