If BytesList in TFRecords has always length of 0 or 1, then the feature is inferred to have StringType instead of ArrayType. Is there a reason for this behavior? With this behavior you can write a DataFrame as TFRecords, but you can't read those TFRecords back to a DataFrame. Zero length BytesList is valid in Tensorflow.
If
BytesList
in TFRecords has always length of 0 or 1, then the feature is inferred to haveStringType
instead ofArrayType
. Is there a reason for this behavior? With this behavior you can write a DataFrame as TFRecords, but you can't read those TFRecords back to a DataFrame. Zero lengthBytesList
is valid in Tensorflow.Below is the implementation of the
parseBytesList
from https://github.com/tensorflow/ecosystem/blob/master/spark/spark-tensorflow-connector/src/main/scala/org/tensorflow/spark/datasources/tfrecords/TensorFlowInferSchema.scala#L144: