Closed junwan01 closed 3 years ago
@junwan01,
I have tried reproducing the error but I didn't get any error, even though I have used import pyspark
before and after import tensorflow_transform as tft
.
It may be because you are defining the Class but not creating an Instance of it. Can you please share the complete code so that we can reproduce the error at our end. Thanks!
@rmothukuru, thanks for looking into this.
I have update the code in the original report, so that it will reproduce the problem. My initial code was indended to be run as a unit test in PyCharm, and not a standalone as script that will trigger the failure. I have stripped unittest class, and only keep the core part of the code that is problematic. You should be able to run this as a script and see the error. Thanks!
Could reproduce the issue in Google Colab with Tensorflow Version 2.0.0, Tensorflow_transform Version 0.15.0 and Apache_Beam Version 2.16.0.
Here is the Github Gist of Google Colab.
tensorflow==2.0.0 apache-beam==2.16.0 tensorflow-transform==0.15.0 python 3.7
When I run a transform that has combiner invoked, e.g. tft.bucketize(), it crashes with message: "unhashable type: 'ConfigProto'", if there is an
import pyspark
statement at the beginning of the module, even though the imported class is never referenced inside the transform code. If I move the import below theimport tensorflow_transform as tft
, it works again.When I step debug, the Python seems to be confused about the object
graph_state_options
of class_QuantilesGraphStateOptions
at certain point, and think it isDType
, and invoked wrong__hash__()
.Spark is not part of transform logic, however it is imported by some upstream Python code that produces the tfrecord dataset to be used by TF Transform, so we have to import it.
The same test code runs fine under TFT 0.14.0, TF 1.14 and apache-beam 2.15.0.
Here is the slimmed down code that can reproduce the error.
Here is the complete error message: