Open perretv opened 2 years ago
@yongtang you seem to be knowledgeable on the subject :)
Note that the use of tfio.experimental.serialization.decode_json
can circumvent the problem:
import json
import sys
import tempfile
import tensorflow as tf
import tensorflow_io as tfio
print(f"tensorflow={tf.__version__}")
print(f"tensorflow-io={tfio.__version__}")
print(sys.version)
tempfile = tempfile.NamedTemporaryFile(suffix=".json")
data = {"key1": 1, "key2": 2.0, "key3": True, "key4": "text"}
json.dump(data, open(tempfile.name, "w"))
io_tensor = tfio.IOTensor.from_json(tempfile.name)
spec_dict = {c: s for c, s in zip(io_tensor.columns, io_tensor.spec)}
json_tensor = tfio.experimental.serialization.decode_json(tf.io.read_file(tempfile.name), spec_dict)
for key in data:
assert json_tensor[key].numpy() == data[key].encode() if isinstance(data[key], str) else data[key]
print(f"{key} with dtype {json_tensor[key].dtype} parsed successfully")
returns:
tensorflow=2.7.0
tensorflow-io=0.23.1
3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
key1 with dtype <dtype: 'int64'> parsed successfully
key2 with dtype <dtype: 'float64'> parsed successfully
key3 with dtype <dtype: 'bool'> parsed successfully
key4 with dtype <dtype: 'string'> parsed successfully
It seems that
tfio.IOTensor.from_json
is currently unable to decode json files that contain string values. The error can be reproduced with the following code:The raised error looks like the following:
We can see that
int
,float
&bool
dtypes are successfully decoded howevertfio
fails when encountering astr
in the json file.