rstudio / tfdatasets

R interface to TensorFlow Datasets API
https://tensorflow.rstudio.com/tools/tfdatasets/
34 stars 12 forks source link

create a tensorflow trainingset based on a set of images with matching text file #75

Open ThierryO opened 4 years ago

ThierryO commented 4 years ago

My data consists of a set of png files with matching text file (same name, differnent extension). The text file is in comma separated format, header and contains only doubles (. as decimal). The number of data rows (13) and columns (50) is fixed.

The code for the text file returns a RaggedTensor, while the model needs a Tensor of shape (13, 50). How can I get that?

library(keras)
library(tfdatasets)
decode_img <- function(file_path) {
  file_path %>%
    tf$strings$regex_replace("csv", "png") %>%
    tf$io$read_file() %>%
    tf$image$decode_png(channels = 1) %>%
    tf$image$convert_image_dtype(dtype = tf$float32) %>%
    tf$image$transpose() %>%
    tf$reshape(c(416L, 156L))
}
decode_tsv <- function(file_path) {
  file_path %>%
    tf$io$read_file() %>%
    tf$strings$split(sep = "\n") %>%
    tf$gather(tf$range(0L, 14L)) %>%
    tf$strings$split(sep = ",") %>%
    tf$strings$to_number()
}
preprocess_path <- function(file_path) {
  list(
    decode_img(file_path),
    decode_tsv(file_path)
  )
}
train_stream_ds <- file_list_dataset(
  file_pattern = paste0(normalizePath(train_dir), "/*/*.csv")
) %>%
  dataset_map(
    preprocess_path,
    num_parallel_calls = tf$data$experimental$AUTOTUNE
  )
prepare <- function(ds, batch_size, shuffle_buffer_size) {
  if (shuffle_buffer_size > 0) {
    ds <- ds %>% dataset_shuffle(shuffle_buffer_size)
  }
  ds %>%
    dataset_batch(batch_size) %>%
    dataset_prefetch(buffer_size = tf$data$experimental$AUTOTUNE)
}
t-kalinowski commented 3 years ago

Hi, can you turn this into a reprex? (Add some code snippits that make some small fake data in tempdir() that matches the structure of your dataset)