Convert to TF Lite: GAN model to cartoonize photos

khanhlvg commented 4 years ago

This model is implemented in TF 1 and can convert photos to cartoon. It looks quite visually impressive! https://github.com/SystemErrorWang/White-box-Cartoonization

Image2Cartoon

@sayakpaul Would you be able to try converting this model to TF Lite?

sayakpaul commented 4 years ago

Thank you for bringing this to my notice, @khanhlvg. I have self-assigned this and will start working on it. I will keep you posted about the progress here in this thread.

sayakpaul commented 4 years ago

Hey @khanhlvg I have started working on it. I have been able to extract a SavedModel out of the pre-trained checkpoints but I am running into issues during the TF Lite conversion step:

--------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-99e790548508> in <module>()
      1 converter = tf.lite.TFLiteConverter.from_saved_model('saved_model_dir')
      2 converter.optimizations = [tf.lite.Optimize.DEFAULT]
----> 3 tflite_model = converter.convert()
      4 
      5 tflite_model_size = open(int_tflite_path, 'wb').write('whitebox_cartoon_gan.tflite')

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/lite.py in convert(self)
    481               "None is only supported in the 1st dimension. Tensor '{0}' has "
    482               "invalid shape '{1}'.".format(
--> 483                   _get_tensor_name(tensor), shape_list))
    484         elif shape_list and shape_list[0] is None:
    485           # Set the batch size to 1 if undefined.

ValueError: None is only supported in the 1st dimension. Tensor 'generator/Conv_9/BiasAdd' has invalid shape '[1, None, None, 3]'.

Here's the Colab Notebook.

khanhlvg commented 4 years ago

The stylized image looks really cool!

Could you try converting with tf-nightly? The model has a dynamic shape input and it's only supported since TF 2.3.

sayakpaul commented 4 years ago

Ah bummer. How could I miss out on that part? 😖

I just tried it out and guess what? The TF Lite model (6.4 KB) is converted 😄

Now, what's needed is the pre-trained weights for the generator network. I have filed an issue already in their repository. As soon as I have an update, I will keep this thread posted. Here's the Colab Notebook made so far.

khanhlvg commented 4 years ago

Wow 6.4KB doesn't sound very right. I suspect that some weights might have been left out when converting to TF Lite. Could you try running inference with the TF Lite model?

sayakpaul commented 4 years ago

@khanhlvg yes, I suspected that one too. I tried replicating the original inference pipeline with the converted TFLite model and the output is coming to be spurious.

Here's the Colab Notebook. Let me know if I am missing out on something.

khanhlvg commented 4 years ago

The issue came from your saved model export code. You use an intermediate output (network_out) of the GAN model as one of the model input, so that TFLiteConverter get confused and removed the whole GAN network from the TF Lite model. You can retry with this code.

tf.saved_model.simple_save(
      sess,
      '/content/saved_model_dir',
      inputs={input_photo.name: input_photo},
      outputs={final_out.name: final_out}
)

However I'm still facing an issue that the color of the output is off. I think there's an issue with pre/post-processing code but I haven't been able to pin point the issue. Can you take a look?

khanhlvg commented 4 years ago

I was able to make it working! Here is the notebook

The weird color issue is because the model use cv2 for preprocessing and post-processing, which use BGR. However in your notebook, visualization is done with PIL which use RGB. Therefore, a small swap was needed.

I run benchmark of the model on my Pixel 3 and it is incredibly fast! I think we can do live video cartoonization with this :)

Inference timings in microsecond: Init: 7584, First inference: 4910, Warmup (avg): 364.436, Inference (avg): 364.822

sayakpaul commented 4 years ago

Not sure if this is a coincidence again. I was just about to inform you about this @khanhlvg

sayakpaul commented 4 years ago

What next steps would you suggest @khanhlvg for the conversion?

khanhlvg commented 4 years ago

@sayakpaul Would you be able to:

Create several variations of the model:
- float16 with fixed input shape (e.g. 224 x 224) to run on GPU
- full integer quantized model
Add metadata to make it easy to integrate to Android
Prepare for uploading to TF Hub

Besides, I made a mistake with benchmarking. I forgot to set the shape for the input image. The model is actually quite slow. For example, an 224 x 224 input on int8 model on Pixel 3 takes 729 seconds. So unfortunately it might not be suitable for live video.

Inference timings in us: Init: 8059, First inference: 805819, Warmup (avg): 805819, Inference (avg): 726875

sayakpaul commented 4 years ago

Yes, even on a Colab CPU the model takes quite a while to spit out the results.

Add metadata to make it easy to integrate to Android

Sure. Could you mention which Metadata example I would refer to for this?

Prepare for uploading to TF Hub

The model might not be very fast. Would we still want to do this? Now, of course, it's still a very cool example to create a mobile app with this kind of capability.

@khanhlvg

khanhlvg commented 4 years ago

Yes, even on a Colab CPU the model takes quite a while to spit out the results.

It's because TF Lite runtime is not optimized to run quantized model on Linux. If you use float model, then it'll be using XNNPACK and it's blazingly fast. The quantized model runs faster on mobile devices vs. on Colab.

Sure. Could you mention which Metadata example I would refer to for this?

Please use the selfie2anime metadata format. It's the same concept: image in -> image out.

The model might not be very fast. Would we still want to do this? Now, of course, it's still a very cool example to create a mobile app with this kind of capability.

Inference time below 1s on CPU is still very good for still images. And the quantized model is just a little larger than 1MB so definitely it's very useful on mobile :)

sayakpaul commented 4 years ago

It's because TF Lite runtime is not optimized to run quantized model on Linux. If you use float model, then it'll be using XNNPACK and it's blazingly fast. The quantized model runs faster on mobile devices vs. on Colab.

Yes, I am aware of this. Just wanted to bring the topic up so that you know.

I will proceed with the next steps and post the updates here.

Thanks!

sayakpaul commented 4 years ago

@khanhlvg a couple of updates.

I converted four different variants: float16, dynamic-range, int8 and full int8. All of the models (metadata is yet to be added) reside here: gs://cartoon_gan (public). I did not export the float16 model to support dynamic shape as we had discussed.

Here is the Colab Notebook that demonstrates the conversion and the inference processes.

Observations:

When I used cv2.cvtColor(image, cv2.COLOR_BGR2RGB) just after reading the image from disk using cv2.imread() and then ran inference I got different results as you can see below:

After exporting the int8 and full int8 models the "index" of the model inputs is getting set to 156 as can be seen below:

[{'dtype': numpy.float32,
'index': 156,
'name': 'input_photo',
'quantization': (0.0, 0),
'quantization_parameters': {'quantized_dimension': 0,
'scales': array([], dtype=float32),
'zero_points': array([], dtype=int32)},
'shape': array([1, 1, 1, 3], dtype=int32),
'shape_signature': array([ 1, -1, -1,  3], dtype=int32),
'sparsity_parameters': {}}]

Is this expected?

The int8 and the full int8 models introduce artifacts in the final image (cv2.cvtColor() was applied before running inference):

That's all for now. I will get back to you after adding metadata.

sayakpaul commented 4 years ago

@khanhlvg metadata has been added to the models and they live here: gs://cartoon_gan/model_with_metadata.

Here's the Colab Notebook.

khanhlvg commented 4 years ago

Thanks Sayak!

When I used cv2.cvtColor(image, cv2.COLOR_BGR2RGB) just after reading the image from disk using cv2.imread() and then ran inference I got different results as you can see below:

The model is trained with BGR images so if you feed it an image in RGB, the output cartoonized image won't look as good as if you feed it an BGR images.

After exporting the int8 and full int8 models the "index" of the model inputs is getting set to 156 as can be seen below:

Tensor index doesn't have any special meaning, just an id so feel free to ignore the actual number.

The int8 and the full int8 models introduce artifacts in the final image (cv2.cvtColor() was applied before running inference):

It's because the model cv2 loads images as BGR. Therefore if you load your input image using cv2, then you should do the image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) after inference.

I also tried running the fp16 model on GPU but faced a couple of issues. I'll ask the GPU delegate team to investigate.

ERROR: Following operations are not supported by GPU delegate:
DEPTHWISE_CONV_2D: Expected 1 runtime input tensor(s), but node has 0 runtime input(s).
DEQUANTIZE:
102 operations will run on the GPU, and the remaining 4 operations will run on the CPU.
ERROR: TfLiteGpuDelegate Init: MUL: Node 43 is already a consumer of the value 0
ERROR: TfLiteGpuDelegate Init: MUL: Node 1 is already a consumer of the value 7
INFO: Created 0 GPU delegate kernels.
ERROR: TfLiteGpuDelegate Prepare: delegate is not initialized
ERROR: Node number 106 (TfLiteGpuDelegateV2) failed to prepare.

ERROR: tensorflow/lite/kernels/depthwise_conv.cc:128 filter->type == data_type || data_type == kTfLiteInt16 was not true.
ERROR: Node number 2 (DEPTHWISE_CONV_2D) failed to prepare.

Btw I'm taking days off until the end of this week so please expect some delays in my reply. Thanks :)

sayakpaul commented 4 years ago

Fair points, @khanhlvg. I will update the notebook accordingly.

Could you verify if the metadata has been populated correctly when you get a moment?

Upon your go-ahead I will proceed to TF Hub publication (which I know will be done after we are done with the GPU delegate issue).

Also, enjoy your days off!

margaretmz commented 4 years ago

@sayakpaul I should verify the tflite model (with metadata) in the Android app.

Please place the tflite models with/without metadata under a ml folder here: https://github.com/margaretmz/CartoonGAN-e2e-tflite-tutorial/tree/master. Or update the README with links pointing to the models if you prefer to have them located elsewhere. It's difficult for me to hunt them down from the discussion thread here.

sayakpaul commented 4 years ago

I did not realize you created a repository @margaretmz as I was planning to have the notebook hosted from here. And once @khanhlvg and I were done publishing the models on TF Hub we the plan was to directly download them from TF Hub.

You can retrieve the models (with metadata from here) :

Dynamic-range quantized: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_dr.tflite
FP16: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_fp16.tflite
Full int8: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_full_int8.tflite

sayakpaul commented 4 years ago

@margaretmz FYI I went ahead and created a card for CartoonGAN Android App and I also created an issue from the card: https://github.com/ml-gde/e2e-tflite-tutorials/issues/11.

sayakpaul commented 4 years ago

@khanhlvg the outputs with the int8 models still remain the same after I ran a BGR image through them and applied cv2.cvtColor(output, cv2.COLOR_BGR2RGB) after running the inference.

Here's the updated Colab Notebook.

margaretmz commented 4 years ago

I did not realize you created a repository @margaretmz as I was planning to have the notebook hosted from here. And once @khanhlvg and I were done publishing the models on TF Hub we the plan was to directly download them from TF Hub.

You can retrieve the models (with metadata from here) :

Dynamic-range quantized: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_dr.tflite

FP16: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_fp16.tflite

Full int8: https://storage.googleapis.com/cartoon_gan/model_with_metadata/whitebox_cartoon_gan_full_int8.tflite

@sayakpaul - Project repo - each tutorial will be a separate project repo. I only created the project repos for the ones that I'm working on: selfie2anime, esrgan, CartoonGAN, segmentation + style transfer. I expect other project teams to create their own repos and add the links to the E2E TFLite Tutorials repo. Where to host tflite models - either on your GitHub or TFHub would be fine; however, the best way to validate a tflite model with metadata before publishing on TFHub, is to actually import the model to an Android App via ML Model Binding, and use the generated class in Android code.

Thanks for the links, I will try the tflite models with metadata in an Android app.

sayakpaul commented 4 years ago

Thanks, makes sense. I think we could work on a short document to enlist these different steps to let the interested contributors know about how we generally go about doing things here.

Let me know your thoughts.

margaretmz commented 4 years ago

Sounds good. How about use the wiki:https://github.com/ml-gde/e2e-tflite-tutorials/wiki/How-to-contribute-to-the-E2E-TFLite-Tutorials%3F? Although I felt some of the info is already covered in README. Perhaps we just need to update README? Which would you prefer? Either way, we should try out the documented workflow to make sure it works, before communicating widely to the broader community. Thanks.

sayakpaul commented 4 years ago

@margaretmz I think having it on README would work as well.

Either way, we should try out the documented workflow to make sure it works, before communicating widely to the broader community. Thanks.

I am in 100% agreement.

sayakpaul commented 4 years ago

@khanhlvg I was able to fix the issue with int8 models. The problem was I was passing randomly created data of (1, 224, 224, 3) shape and this is probably why the calibrations weren't being done properly. Also, there was no way the system could understand the order of the channels from a randomly created images.

So, here's what I did:

IMG_SIZE = 224
images_list = os.listdir('/content/source-frames')

# int8 quantization requires a representative dataset generator
def representative_dataset_gen():
    for image_path in images_list:
        image = cv2.imread(os.path.join('/content/source-frames', image_path))
        image = cv2.resize(image, (IMG_SIZE, IMG_SIZE))
        image = image.astype(np.float32)/127.5 - 1
        image = np.expand_dims(image, axis=0)
        yield [image]

model = tf.saved_model.load('saved_model_dir')
concrete_func = model.signatures[
    tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]

converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()

open('whitebox_cartoon_gan_int8.tflite', 'wb').write(tflite_model)

The output is now coming as expected:

The Colab Notebook is also updated with these changes.

@margaretmz I have updated these models with Metadata as well but the links I earlier mentioned in this comment will still remain the same. Looking forward to this exciting project :)

margaretmz commented 4 years ago

Thanks @sayakpaul. I will add Android code to [the project repo](https://github.com/margaretmz/CartoonGAN-e2e-tflite-tutorial soon. So far I'm encountering errors running inference with the tflite models:

DR quantized & full int8: For I got error "Cannot copy from a TensorFlowLite tensor (final_output) with 192 bytes to a Java Buffer with 12 bytes"
FP16: output bitmap is black.

@khanhlvg - do we also need to add OpenCV library to Android in order to have the image as BGR?

sayakpaul commented 4 years ago

@margaretmz are you using the tf-nightly interpreter? I converted the models using tf-nightly. Sorry, I should have mentioned it.

FYI, the output image also needs a bit of processing as you'd see in the Colab Notebook. If I missed out on that in the while populating metadata (Colab Notebook), please let me know.

FYI, I had created a separate issue thread for the Android application demonstrating this model.

Also, were you able to import the models via Model Binding?

khanhlvg commented 4 years ago

DR quantized & full int8: For I got error "Cannot copy from a TensorFlowLite tensor (final_output) with 192 bytes to a Java Buffer with 12 bytes"

I think this is because the ML Model Binding doesn't support resizing the input shape yet. Let me file a bug to the Android Studio team. Thanks for surfacing the issue! Meanwhile, @margaretmz @sayakpaul do you think we should create fixed shape model to proceed, or wait for Android Studio team to support dynamic shape model?

@khanhlvg - do we also need to add OpenCV library to Android in order to have the image as BGR?

As model interface generated by ML Model Binding takes Bitmap as the input, it needs to support models with BGR input first. Let me file another feature request to the Android Studio team.

As Android Studio has long release cycle, I think it may take months before the before went into beta. Shall we start building the demo directly with TF Lite interface (e.g. using TF Lite Interpreter and Support Library for pre-processing)?

sayakpaul commented 4 years ago

Meanwhile, @margaretmz @sayakpaul do you think we should create fixed shape model to proceed, or wait for Android Studio team to support dynamic shape model?

I think this would be a win-win. We already have the dynamic-shape supporting models with us. If we decide to go down this path what shape would y'all suggest we keep?

Shall we start building the demo directly with TF Lite interface (e.g. using TF Lite Interpreter and Support Library for pre-processing)?

Might be a better choice.

khanhlvg commented 4 years ago

If we decide to go down this path what shape would y'all suggest we keep?

If we create a fixed shape model, then (512, 512) is probably good enough.

sayakpaul commented 4 years ago

Fair enough. Once we all decide the next steps from here I can go ahead and create separate models with fixed-shaped inputs and populate them with metadata.

margaretmz commented 4 years ago

@margaretmz are you using the tf-nightly interpreter? I converted the models using tf-nightly. Sorry, I should have mentioned it.

Yes I'm using tf-nightly interpreter

FYI, the output image also needs a bit of processing as you'd see in the Colab Notebook. If I missed out on that in the while populating metadata (Colab Notebook), please let me know.

If the metadata is set properly the post processing, it should be taken care of in the generated class.

FYI, I had created a separate issue thread for the Android application demonstrating this model.

I will try to track the issues under the project repo to that's it's easier to find. Let's try to use the issues in this repo for assigning project resources and then track details under each project repos.

Also, were you able to import the models via Model Binding?

Yes, I'm able to import all 3 models.

Meanwhile, @margaretmz @sayakpaul do you think we should create fixed shape model to proceed, or wait for Android Studio team to support dynamic shape model?

Fixed shape model to proceed sounds good to me, too

Shall we start building the demo directly with TF Lite interface (e.g. using TF Lite Interpreter and Support Library for pre-processing)?

It will take more time but I will give it a try.

sayakpaul commented 4 years ago

Thanks, @margaretmz. I will proceed with converting the models with fixed shaped (520, 520) inputs.

I will try to track the issues under the project repo to that's it's easier to find. Let's try to use the issues in this repo for assigning project resources and then track details under each project repos.

I am in agreement.

I am going to close this issue, and let's shift all of our communication here.

Cc: @khanhlvg

sayakpaul commented 4 years ago

@khanhlvg and @margaretmz models published on TF Hub: https://tfhub.dev/sayakpaul/lite-model/cartoongan/dr/1

ml-gde / e2e-tflite-tutorials

Convert to TF Lite: GAN model to cartoonize photos #9