[RNN] GRU conversion/performance issues on CPU on Windows machines

DLumi commented 2 years ago

1. System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 11 Home Single Language, ver 21H2
CPU: AMD Ryzen 7 6800H
Python: 3.10.4
TensorFlow installation (pip package or built from source): pip
TensorFlow library (version, if pip package or github SHA, if built from source): 2.10.0

2. Code

Please note that the issue is only noticeable on Windows machines (tested on 3 different PCs). In Colab and on a Linux machine I saw little to no decline in performance. https://colab.research.google.com/drive/1d6E3VjbN57ojDd1X0sfG2KA3x7wTMf5N?usp=sharing

3. Failure after conversion

Model fails to convert with default operation set. Conversion is successful with the extended operation set, however I saw about x3 decline in performance during inference on CPU.

4. (optional) RNN conversion support

If converting TF RNN to TFLite fused RNN ops, please prefix [RNN] in the title.

5. (optional) Any other info / logs

The conversion error traceback can be seen in the Colab notebook above. The issue is also present in TF 2.9.1, and it happens for both Intel and AMD CPUs.

tiruk007 commented 2 years ago

@sachinprasadhs I was able to reproduce the issue on Colab using TF v2.10. Please find the gist here for reference. Thank you!

DLumi commented 2 years ago

Mind you, the collab results are not as bad (I'd say they are at least tolerable) as they get when you run the same notebook on a windows machine.

DLumi commented 1 year ago

Hi there,

Just wanted to check if there have been any updates on my GitHub issue after six months. Any progress on this one so far?

Thank you!

pjpratik commented 1 year ago

Hi,

Thank you for opening this issue. Since this issue has been open for a long time, the code/debug information for this issue may not be relevant with the current state of the code base.

The TFLite team is constantly improving the framework by fixing bugs and adding new features. We suggest you try the latest TensorFlow version with the latest compatible hardware configuration which could potentially resolve the issue. If you are still facing the issue, please create a new GitHub issue with your latest findings, with all the debugging information which could help us investigate.

Please follow the release notes to stay up to date with the latest developments which are happening in the TFLite space.

Thanks.

DLumi commented 1 year ago

Guys, you can't just ignore the problem, and close the issue, right? The code information for this issue is very much relevant with the current state of the code base, which takes, like, a couple minutes to run. But I'll make your job easier, as I ran it myself. I tested the code from the first message with latest TF 2.13.0, albeit on Python 3.8.13 this time. Still Windows machine. I still see a 2.5 times decline in inference performance on CPU. It may have become a bit better overall, but the difference is pretty negligible.

In Collab it did improve a little bit, but the difference between SavedModel and TFLite was little in the first place.

P.S. I don't really know why there's a cpu-intel label on this, as I've stated that the issue is reproducible on both Intel and AMD CPUs on Windows

pkgoogle commented 1 year ago

Hi @DLumi, to be sure I see a colab to reproduce... the times you see the most significant performance issues are you running essentially the same code (from colab) on Windows? If it is different in any way please share the exact code which is actually ran on your system, including steps to run. Also, are you doing it on Windows or WSL?

DLumi commented 1 year ago

Hi @DLumi, to be sure I see a colab to reproduce... the times you see the most significant performance issues are you running essentially the same code (from colab) on Windows? If it is different in any way please share the exact code which is actually ran on your system, including steps to run. Also, are you doing it on Windows or WSL?

Yes, it was the same notebook, which I then uploaded to colab. I did my testing on native Windows, CPU only. There's a possibility that the issue may occur on GPU (for the older packages that still have native Windows GPU support), but I'm unable to test this behavior.

DLumi commented 11 months ago

Just occasionally stumbled upon this one again. I'm messing around with various quantization schemes, and since tf.lite.Optimize.DEFAULT actually means that the weights are quantized to int8 (but activations are not), I'm pretty sure this is the root cause of it all. If you remove converter.optimizations completely (to get a FP32 model), or leave the DEFAULT but specify converter.target_spec.supported_types = [tf.float16] (to get a FP16 model), then the model runs just fine. But if you have a dynamic int8 model or static int8 model, then the inference time just skyrockets.

So TL;DR: there's something wrong with inference of int-8 quantized GRU layers, so you might wanna take a look at those specifically

pkgoogle commented 5 months ago

Hi @DLumi, can you try with our new library AI-Edge-Torch I was able to convert it this way, you can find more information here: googleblog.

import ai_edge_torch
import torch
from torch import nn

bidir_gru = nn.GRU(input_size=256, hidden_size=256, batch_first=True, bidirectional=True)
sample_inputs = (torch.randn(1, 50, 256),)

edge_model = ai_edge_torch.convert(bidir_gru.eval(), sample_inputs)
edge_model.export("bidir_gru.tflite")

You'll have to use WSL if you go this route. Alternatively since most use cases of bidirectional GRU's are superseded by Transformers these days you may wish to try that out instead, more info here: Introducing AI Edge Torch Generative API

DLumi commented 5 months ago

Hi @DLumi, can you try with our new library AI-Edge-Torch I was able to convert it this way, you can find more information here: googleblog.
import ai_edge_torch
import torch
from torch import nn

bidir_gru = nn.GRU(input_size=256, hidden_size=256, batch_first=True, bidirectional=True)
sample_inputs = (torch.randn(1, 50, 256),)

edge_model = ai_edge_torch.convert(bidir_gru.eval(), sample_inputs)
edge_model.export("bidir_gru.tflite")
You'll have to use WSL if you go this route. Alternatively since most use cases of bidirectional GRU's are superseded by Transformers these days you may wish to try that out instead, more info here: Introducing AI Edge Torch Generative API

I guess that solves… something? Not my issue, though. I have never mentioned torch anywhere, I’m using TF to TFLite converter. And converting model from TF to torch to convert to TFLite seems kinda mad to me.

Additionally, thanks for suggesting transformers, but I’m doing really well without them, and I don’t really see a need to switch just because TFLite cannot have a proper support for this layers.

tensorflow / tensorflow