linkedin / spark-tfrecord

Read and write Tensorflow TFRecord data from Apache Spark.
BSD 2-Clause "Simplified" License
291 stars 57 forks source link

Protocol message tag had invalid wire type. #25

Closed ak2911 closed 2 years ago

ak2911 commented 3 years ago

error while reading tfrecord. link to tfrecord file

kindly suggest, if it need any manual schema creation or setting for each tfrecord.


df = spark.read.format("tfrecord").option("recordType", "Example").load('tfrecordFile')

ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) com.linkedin.spark.shaded.com.google.protobuf.InvalidProtocolBufferException$InvalidWireTypeException: Protocol message tag had invalid wire type.

junshi15 commented 3 years ago

The error came from google protobuf. Were you able to load the file with other tools, such as native tensorflow dataset api?

ak2911 commented 3 years ago

Thanks JunShi for your reply. Yes, I am able to load/access it using Tensorflow own api. Getting error while using spark-tfrecord api.

Any probable reason for this error or do I need to specify any parameter before loading new tfrecord?

junshi15 commented 3 years ago

How was your file generated?

I googled the error, most likely you had a corrupted file. for example: https://stackoverflow.com/questions/6138721/protobuf-errorprotocol-message-tag-had-invalid-wire-type

I am puzzled that tensorflow api can handle it.

ak2911 commented 3 years ago

Thanks JunShi, but it is working properly with tf data api. you can check yourself. link to tfrecord file.

Have also checked the above stackoverflow link. error seems to be generic one. Need any suggestion from spark-tfrecord dev team.

junshi15 commented 3 years ago

Can you provide a smaller file? say less than 1M? The file you provided is about 1G.