Closed DLPerf closed 2 years ago
@hanjr92 Hi, bro, could you consider the issue?
Thanks!I will modify them.
def read_and_decode(filename):
# generate a queue with a given file name
raw_dataset = tf.data.TFRecordDataset([filename]).shuffle(1000).batch(4)
features = {}
for serialized_example in raw_dataset:
features['label'] = tf.io.FixedLenFeature([], tf.int64)
features['img_raw'] = tf.io.FixedLenFeature([], tf.string)
features = tf.io.parse_example(serialized_example, features)
# You can do more image distortion here for training data
img_batch = tf.io.decode_raw(features['img_raw'], tf.uint8)
img_batch = tf.reshape(img_batch, [4, 224, 224, 3])
# img = tf.cast(img, tf.float32) * (1. / 255) - 0.5
label_batch = tf.cast(features['label'], tf.int32)
yield img_batch, label_batch
@DLPerf
Hello, I found a performance issue in the difinition of
read_and_decode
, examples/data_process/tutorial_tfrecord.py, tf.io.FixedLenFeature will be called repeately during the program execution, resulting in reduced efficiency. So I think the dictionary should be created before the loop.The same issues exist line 69 &70, and line 258 & 259 .
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.