Closed albertoandreottiATgmail closed 1 year ago
Are you using Spark-TFRecord to read the protobuf? If so, you should just do
spark.read.format("tfrecord").option("recordType", "Example").load(path)
If you are parsing the bytes yourself, then your question is not related to Spark-TFRecord.
Yep, I'm using spark-tfrecord, the thing is that I believe that encoding the binary buffer to string and then converting to byte array might be generating data corruption because of the conversions between different encodings... just that.
In case it helps others, I just forced the schema for the column to be,
ArrayType(BinaryType)
and then the binary data became usable.
Thanks!
Hello!,
I have the following situation: I work reading a TF Example in which one of the columns is a BytesList, I can read it as a Java String. Now, I would like to decode the original binary data which is a protobuf. So, I go with,
myString.getBytes()
and pass that to the parseFrom() method of my Java class(as created by the proto compiler). This is not working, I'm getting,
CodedInputStream encountered a malformed varint.
My question is, is this the right way to recover the binary buffer? Or is it possible I'm breaking it somehow in the way ?
Thanks!