microsoft / VoTT

Visual Object Tagging Tool: An electron app for building end to end Object Detection Models from Images and Videos.
MIT License
4.28k stars 834 forks source link

some question about TFRecord-"image/encoded" #952

Closed xddcore closed 4 years ago

xddcore commented 4 years ago

hi,sir. I used vott to output a TFRecord file. After querying, I learned that the TFRecord file contains a key "image / encoded", I read the TFRecord file, and get its corresponding value through the following code.

image_feature_description = {
    "image/encoded": tf.io.FixedLenFeature([], tf.string),#图像数据
}
def _parse_image_function(example_proto):
  # Parse the input tf.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, image_feature_description)
raw_dataset = raw_dataset.map(_parse_image_function)
for item in  raw_dataset:
    print(np.frombuffer(item['image/encoded'].numpy(),dtype=np.uint8).flatten().shape)

It has the following output: (10187,)

I input 320,240 RGB images, the theoretical value should be (320,240,3). The output (10187,) is obviously unreasonable. I refer to the code in https://github.com/microsoft/VoTT/blob/master/src/providers/export/tensorFlowRecords.ts. Through the line 61 code: const imageBuffer = new Uint8Array (arrayBuffer) ;, it seems that the image data is directly stored in "image / encoded". I can't learn about the problem with my code. Can you help me explain how to read the value in the "image / encoded" key? Have a nice day!

xddcore commented 4 years ago

I think i find answer. the right code is: img = tf.image.decode_jpeg(item['image/encoded']) after this code,you can get (240, 320, 3)