tensorflow / text

Making text a first-class citizen in TensorFlow.
https://www.tensorflow.org/beta/tutorials/tensorflow_text/intro
Apache License 2.0
1.22k stars 337 forks source link

Using tensorflow_text with tflite. #287

Open r-wheeler opened 4 years ago

r-wheeler commented 4 years ago

tensorflow_text is great -- we were previously developing custom c++ ops and compiling them with tensorflow serving, now we don't need to.

There are a number of text models I would be interested in using in tflite such as the albert / bert variations on tensorflow hub. Are there any plans to support tflite compatible ops (bert tokenization) that can be ran with tflite?

I believe they need specific kernel implementations, as mentioned here: https://www.tensorflow.org/lite/guide/ops_custom

broken commented 4 years ago

Thanks for the interest. We are working on providing something for tf.lite even if it's not as a fully featured.

hanneshapke commented 1 year ago

Hi @broken, Do you know what the status of the tf.text for tf.lite is? Converting a model with tf.text works, but invoking it fails (TF 2.9) with

RuntimeError: Table not initialized.Node number 683 (TfLiteFlexDelegate) failed to invoke.

It seems the lookup table for the tokenizer can't be initialized. Do you have any advice how to initialize the table?

Thank you for your reply!

hanneshapke commented 1 year ago

Just found this amazing resource: https://www.tensorflow.org/text/guide/text_tf_lite

It solved the RuntimeError. Only certain tokenizers can be converted to TFlite. The link above contains the list of convertable TFtext classes.

broken commented 1 year ago

Hi, I'm glad you found it! That page is the best resource. We are trying to have all new ops convertible to TF Lite, and converted some of our more popular existing ones. Many of these needed a new implementation to meet performance requirements for mobile, and have a slightly different API.

FastBertTokenizer uses FastWordpieceTokenizer and FastBertNormalizer and is convertible. There is no output difference between it and the original BertTokenizer (and WordpieceTokenizer and normalize_utf8), so I would recommend using it for even non-on-device models moving forward. FastBertNormalizer only performs NFD normalization, but this is the default for the BertTokenizer already.

We also now have FastSentencepieceTokenizer for mobile users of the SentencepieceTokenizer. The original SP is a TF kernel wrapper around the main sentencepiece processor on github which is too large for mobile. FastSentencepieceTokenizer is new code, so does not have the same options and is less of a direct replacement, but is much slimmer and faster for on-device use-cases.

The WhitespaceTokenizer was rewritten with similar caching techniques used in FastWordpieceTokenizer and is 2-3x faster than the old one. For this, the underlying kernel was replaced so there aren't any code changes needed by users.

hanneshapke commented 1 year ago

@broken Thank you for the detailed reply and explanations. I have tried to convert a model with the FastBertTokenizer to uint8, but the converted failed because the FastBertTokenizer op isn't supported with uint8. Is that because the token ids are > 256? Can a model using the FastBertTokenizer be converted to uint8 at all because of the id integers?

broken commented 1 year ago

Right; since the output of FastBertTokenizer is IDs, the vocab would be too small to fit in a number that size.

As a side note, TF Lite does not support kernels with polymorphic types like TF does, so most of our kernels don't support more types. However, we recently expanded our tf lite shim with a wrapper to handle polymorphic types, so if there are other kernels you run across while converting that you believe should be convertible and aren't, please open an issue like this and we can look at supporting it.

r-wheeler commented 1 year ago

@broken

Thanks! We have used these in tflite now with success. A bit tangental, but are you aware of any success in quantization of models that make use of these tokenizers? Given the input in the graph is strings, is it suggested to try to block these layers from the quantization process?

Does tflites quantization even allow models with utf8 string input layers?

broken commented 1 year ago

Those are good questions and likely better asked to the TF Lite team. I haven't directly created any production TF Lite models, nor inquired about the tf lite models from other teams using the tokenizers, so my experience with it is limited at the moment. There are a number of TF Lite models our team has prioritized recently, so after the holidays I should be better suited to help.