Closed damienpontifex closed 5 years ago
@damienpontifex there might be some levels of overlap, though the dataset
we work on here, is more about the subclass of tf.data.Dataset and its C++ implementations for working on MNIST data format, while tensorflow-datasets 's dataset
refers to the data packages that could be downloaded and consumed directly.
For example, in our case, MNIST is not just referring to the gzip files that could be downloaded from http://yann.lecun.com/exdb/mnist/, the MNIST itself is a legitimate file format.
As was mentioned in PR #111, MNIST format
was used by Fashion-MNIST, Kuzushiji-MNIST, EMNIST. It is also used by people who want to generate MNIST format so that they could reused the same data pipeline they already tested.
In our readme, we don't provide a way to automatically download the MNIST data (unlike Tensorflow-datasets). It is up to the user to have a file that is in MNIST format, then they could use MNISTDataset which is a subclass of tf.data.Dataset and could be saved in graph.
Thanks for the insights @yongtang
How does this repo decide what's in and what's better elsewhere in the TensorFlow ecosystem?
I ask this as I noticed the readme has a guide using
but tensorflow/datasets also has an mnist dataset.
Should the dataset data not live in here and functionality in tensorflow/io be focused on "collection of file systems and file formats"?