Open EdwardCuiPeacock opened 3 years ago
@EdwardCuiPeacock I don't quite understand how you are getting that warning message, it appears as though tf.init_scope
is being used here: https://github.com/tensorflow/transform/blob/b06e87b74246d968bf63d41425a75936d55bb2d2/tensorflow_transform/tf_utils.py#L1578
As Chris commented above, we do enter the tf.init_scope in tf_utils. Are you passing a lookup_fn to tft.apply_vocabulary? If that is the case, you would need to lift the table creation inside that lookup_fn as TFT does not have access to the table creation code to do this automatically.
Could you give me an example of what your calls to tft.vocabulary and tft.apply_vocabulary look like so I can take a look and see if we missed something?
I've encountered the same warning while using TFX, in my case the init_scope
is not used because the check https://github.com/tensorflow/transform/blob/9b348c8862bfd8bfd462db596111103e58d15006/tensorflow_transform/tf_utils.py#L1654-L1655 fails as asset_filepath
is <class 'tensorflow.python.framework.ops.Tensor'>
.
My usage is:
transformed = tft.compute_and_apply_vocabulary(
input_tensor,
frequency_threshold=100,
num_oov_buckets=1,
vocab_filename='tags'
)
It isn't a big issue for me as my vocabulary is small, so I haven't spent much time looking into it and don't have a proper MRE, but maybe that much is helpful.
I've just checked using this notebook: https://colab.research.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/tfx/components_keras.ipynb#scrollTo=jHfhth_GiZI9, and the warning appears with it also. Possibly related to being wrapped with tf.function
(which occurs in TFX Transform), so the vocab_filename
is not an ops.EagerTensor
? It gets wrapped here-ish https://github.com/tensorflow/transform/blob/520ebb492c2f687ff30cce22261938037384b26d/tensorflow_transform/analyzers.py#L1973 into a tensor, so is no longer a string either.
Similar to @gfkeith , I also get this warning when using tft.compute_and_apply_vocabulary()
inside a TFX Transform component's preprocessing_fn
.
We recently encountered scalability issues when trying to apply the vocabularies for multiple (5 to be exact) categorical features. We saw multiple lines of the follwoing warning message:
When using the
tft.apply_vocabulary
, the job would stuck on the transformation steps for hours, consuming thousands of CPU hours if we do not kill it early.Creating a custom lookup table initialization function like the following could bypass the proble; 80M rows of data only took 35 min, consuming ~20 hours of CPU time.
Relevant code need to be addressed: https://github.com/tensorflow/transform/blob/520ebb492c2f687ff30cce22261938037384b26d/tensorflow_transform/mappers.py#L1114
This probably needs to be applied to versions of TFT starting from 1.0