tensorflow / datasets

TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
https://www.tensorflow.org/datasets
Apache License 2.0
4.3k stars 1.54k forks source link

How to convert my tf.data.dataset into image and label arrays #2499

Open sameerp815 opened 4 years ago

sameerp815 commented 4 years ago

I created a tf.data.dataset using the instructions on the keras.io documentation site.

dataset = tf.keras.preprocessing.image_dataset_from_directory( directory, labels="inferred", label_mode="int", class_names=None, color_mode="rgb", batch_size=32, image_size=(32,32), shuffle=True, )

My file directory is organized into classes with jpg files inside.

I don't know how to convert dataset file into and x_train and y_train to use in my model since model.fit doesn't take in the tf datasets.

I would appreciate some help in understanding how to take dataset and create x_trainand y_train

Environment information Windows 10,

Conchylicultor commented 4 years ago

From the doc: https://www.tensorflow.org/api_docs/python/tf/keras/Model?version=nightly#fit

x Input data. It could be:

So you can pass tf.data to Keras (at least for recent versions of TF)

PrattJena commented 4 years ago

Can you provide code snippets along with the output for further clarification on what problem you are facing

ziegenbalg commented 4 years ago

I would like to know this as well. It seems like the keras tuners don't like the tf.data.Datasets yet. They're still expecting (x_train, y_train), (x_test, y_test). Is my thinking correct there? Essentially I'm loading my data using tf.keras.preprocessing.image_dataset_from_directory and would like to feed this into the tuner.

Thank you!

leedrake5 commented 3 years ago

I would like to know this as well. It seems like the keras tuners don't like the tf.data.Datasets yet. They're still expecting (x_train, y_train), (x_test, y_test). Is my thinking correct there? Essentially I'm loading my data using tf.keras.preprocessing.image_dataset_from_directory and would like to feed this into the tuner.

Thank you!

This is where I am too. I am baffled by tf.data.Datasets. Clearly it is meant to be a data pipeline, but it isn't clear how to use it yet. Finding a basic tutorial that says "here's imagedatagenerator, now here's how you can do the same thing with tf.data.Datasets" is very hard.

Conchylicultor commented 3 years ago

For how to use tfds in practice, you can have a look to our end-to-end keras example: https://www.tensorflow.org/datasets/keras_example#step_2_create_and_train_the_model

Please let us know if something isn't clear

fraluegut commented 1 year ago

Hello, I think that this is what you are looking for:

` for image_batch, labels_batch in train_ds: X_train = image_batch.numpy() y_train = labels_batch.numpy() break

for image_batch, labels_batch in val_ds:
  X_test = image_batch.numpy()
  y_test = labels_batch.numpy()
  break

`

xizhoushu commented 1 year ago

If your dataset is generated by tf.keras.utils.timeseries_dataset_from_array() , you can adopt the following method to split the x_train and y_train:

x_train = [x for x,_ in train_data]
y_train = [y for _,y in  train_data]
#Using the np.concatenate to convert  the x_train and y_train into single np.array 
x_train = np.concatenate(x_train , axis = 0)
y_train = np.concatenate(y_train , axis = 0)