Open Shiro-LK opened 4 years ago
Am I the only one to get this error ?
Is it just the BertTokenizer? I'll pass this on to somebody more familiar with Keras.
Is it just the BertTokenizer? I'll pass this on to somebody more familiar with Keras.
@broken. I have fixed the bug in https://github.com/tensorflow/text/pull/460. Could you review it?
Thanks! I missed this over the holidays. We'll take a took.
I am also running into this issue and a similar work-around. In particular, I found that the BertTokenizer needs to be wrapped in a Lambda layer:
class TspBertTokenizer(keras.layers.Layer):
def __init__(self, vocab_file, cls_token_id=None, sep_token_id=None, **kwargs):
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K
import tensorflow_text as text
super(TspBertTokenizer, self).__init__(**kwargs)
self.vocab_file = vocab_file
bert_tokenizer = text.BertTokenizer(self.vocab_file, token_out_type=tf.int32, lower_case=True)
self.tokenize = keras.layers.Lambda(lambda text_input: bert_tokenizer.tokenize(text_input), name="bert_tokenizer")
basic_tokenizer, wordpiece_tokenizer = bert_tokenizer.submodules
self.cls_token_id = cls_token_id if cls_token_id is not None else K.get_value(wordpiece_tokenizer.tokenize("[CLS]")[0]).item()
self.sep_token_id = sep_token_id if sep_token_id is not None else K.get_value(wordpiece_tokenizer.tokenize("[SEP]")[0]).item()
def call(self, nlp_input):
word_tokens = self.tokenize(nlp_input)
flattened_tokens = word_tokens.merge_dims(1, -1)
return flattened_tokens
def get_config(self):
return {
"vocab_file": self.vocab_file,
"cls_token_id": self.cls_token_id,
"sep_token_id": self.sep_token_id,
**super(TspBertTokenizer, self).get_config()
}
Then, it can be added to a Keras Layer. I think this functionally works and an export can be done. However, it is not clear if performance is ideal. I get the following:
[1,0]<stderr>:WARNING:tensorflow:AutoGraph could not transform <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>> and will run it as-is.
[1,0]<stderr>:Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
[1,0]<stderr>:Cause: Unable to locate the source code of <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
[1,0]<stderr>:To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[1,0]<stderr>:AutoGraph could not transform <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>> and will run it as-is.
[1,0]<stderr>:Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
[1,0]<stderr>:Cause: Unable to locate the source code of <bound method TspBertTokenizer.call of <__main__.TspBertTokenizer object at 0x7fc57fa9d3a0>>. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code
[1,0]<stderr>:To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
[1,0]<stderr>:2021-07-08 16:12:23.837909: W tensorflow/core/grappler/optimizers/loop_optimizer.cc:906] Skipping loop optimization for Merge node with control input: pericles/nlp_input/cross_nlp/tsp_bert_tokenizer/bert_tokenizer/RaggedFromUniformRowLength/RowPartitionFromUniformRowLength/assert_greater_equal/Assert/AssertGuard/branch_executed/_107
I do not know if any of these warnings degrade performance or hurt model accuracy. Any feedback on if these warnings are an issue or better work-arounds are much appreciated!
This is on TF 2.5. Also, I filed https://github.com/tensorflow/models/issues/10115 as a downstream issue as well. See the gist there for the export issue without the Lambda.
Thanks for the report! I'll take a look at this and see if we can get a fix pushed soon.
Hi,
I am trying to create a tensorflow model with keras api, when I include the tokenizing process inside the model. It seems to work for the inference locally, but when I am saving the model with
tf.saved_model.save
, I got an error. I am wondering if there is something wrong in my current code, or if it is currently not possible ?My tokenizer which use the BertTokenizer from tensorflow_text (I take the code from some discussion in this forum and modify it) :
My current model :
PS : I am using TF 2.3.1