Open nathanin opened 3 years ago
interleave(
map_func, cycle_length=None, block_length=None, num_parallel_calls=None,
deterministic=None
)
IODataset.interleave
might be useful:
# Preprocess 4 files concurrently, and interleave blocks of 16 records
# from each file.
filenames = ["/var/data/file1.txt", "/var/data/file2.txt",
"/var/data/file3.txt", "/var/data/file4.txt"]
dataset = tf.data.Dataset.from_tensor_slices(filenames)
def parse_fn(filename):
return tf.data.Dataset.range(10)
dataset = dataset.interleave(lambda x:
tf.data.TextLineDataset(x).map(parse_fn, num_parallel_calls=1),
cycle_length=4, block_length=16)
https://www.tensorflow.org/io/api_docs/python/tfio/v0/IODataset#interleave
Switched to graph mode for ~4X speed up.
Still need to test out different dataset formats
1cd9354
Reopening because there's a slow memory leak in graph mode.
Trying to eliminate variables and rule out things like the data pipeline..
This data loader with graph mode seems to work on a smaller dataset, setting no repeats and using the tf.keras.Model.fit
epoch
argument to control the length of training.
Handle hdf5 input efficiently.
Options:
tf.keras.Model.train_step
to take advantage of the Keras input pipelinetensorflow.io
tf.function
Check the reference implementations for clues