rstudio / keras3

R Interface to Keras
https://keras3.posit.co/
Other
831 stars 283 forks source link

Data augmentation issue keras #607

Closed mgendarme closed 5 years ago

mgendarme commented 5 years ago

Dear all,

Following up on my semantic segmentation project already discussed here (Two classes semantic segmentation with U-Net) and here (weighting the different objects I am trying to identify).

For adding a bit of context I am willing to segment cells or nuclei from microscopy images and am providing for this as labels two masks (corresponding to my two classes): class one = the objects (nuclei or cells themselves) class two = the border of the objects only

As I have only limited amount of data with the ground truth segmentation (48 images and their corresponding masks). I would like to use data augmentation to improve the segmentation accuracy without over-fitting (basically like in the original publication from Reineberger et al.)

My model looks like this:

model <- unet(shape = c(WIDTH, HEIGHT, CHANNELS), nlevels = 3, nfilters = 16, dropouts = c(0.1, 0.1, 0.2, 0.3))
____________________________________________________________________________________________________________________________________________________
Layer (type)                                    Output Shape                     Param #           Connected to                                     
====================================================================================================================================================
input_1 (InputLayer)                            (None, 512, 512, 1)              0                                                                  
____________________________________________________________________________________________________________________________________________________
conv2d_1 (Conv2D)                               (None, 512, 512, 16)             160               input_1[0][0]                                    
____________________________________________________________________________________________________________________________________________________
activation_1 (Activation)                       (None, 512, 512, 16)             0                 conv2d_1[0][0]                                   
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_1 (SpatialDropout2D)          (None, 512, 512, 16)             0                 activation_1[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_2 (Conv2D)                               (None, 512, 512, 16)             2320              spatial_dropout2d_1[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_2 (Activation)                       (None, 512, 512, 16)             0                 conv2d_2[0][0]                                   
____________________________________________________________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)                  (None, 256, 256, 16)             0                 activation_2[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_3 (Conv2D)                               (None, 256, 256, 32)             4640              max_pooling2d_1[0][0]                            
____________________________________________________________________________________________________________________________________________________
activation_3 (Activation)                       (None, 256, 256, 32)             0                 conv2d_3[0][0]                                   
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_2 (SpatialDropout2D)          (None, 256, 256, 32)             0                 activation_3[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_4 (Conv2D)                               (None, 256, 256, 32)             9248              spatial_dropout2d_2[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_4 (Activation)                       (None, 256, 256, 32)             0                 conv2d_4[0][0]                                   
____________________________________________________________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)                  (None, 128, 128, 32)             0                 activation_4[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_5 (Conv2D)                               (None, 128, 128, 64)             18496             max_pooling2d_2[0][0]                            
____________________________________________________________________________________________________________________________________________________
activation_5 (Activation)                       (None, 128, 128, 64)             0                 conv2d_5[0][0]                                   
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_3 (SpatialDropout2D)          (None, 128, 128, 64)             0                 activation_5[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_6 (Conv2D)                               (None, 128, 128, 64)             36928             spatial_dropout2d_3[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_6 (Activation)                       (None, 128, 128, 64)             0                 conv2d_6[0][0]                                   
____________________________________________________________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)                  (None, 64, 64, 64)               0                 activation_6[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_7 (Conv2D)                               (None, 64, 64, 128)              73856             max_pooling2d_3[0][0]                            
____________________________________________________________________________________________________________________________________________________
activation_7 (Activation)                       (None, 64, 64, 128)              0                 conv2d_7[0][0]                                   
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_4 (SpatialDropout2D)          (None, 64, 64, 128)              0                 activation_7[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_8 (Conv2D)                               (None, 64, 64, 128)              147584            spatial_dropout2d_4[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_8 (Activation)                       (None, 64, 64, 128)              0                 conv2d_8[0][0]                                   
____________________________________________________________________________________________________________________________________________________
conv2d_transpose_1 (Conv2DTranspose)            (None, 128, 128, 64)             32832             activation_8[0][0]                               
____________________________________________________________________________________________________________________________________________________
concatenate_1 (Concatenate)                     (None, 128, 128, 128)            0                 conv2d_transpose_1[0][0]                         
                                                                                                   activation_6[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_9 (Conv2D)                               (None, 128, 128, 64)             73792             concatenate_1[0][0]                              
____________________________________________________________________________________________________________________________________________________
activation_9 (Activation)                       (None, 128, 128, 64)             0                 conv2d_9[0][0]                                   
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_5 (SpatialDropout2D)          (None, 128, 128, 64)             0                 activation_9[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_10 (Conv2D)                              (None, 128, 128, 64)             36928             spatial_dropout2d_5[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_10 (Activation)                      (None, 128, 128, 64)             0                 conv2d_10[0][0]                                  
____________________________________________________________________________________________________________________________________________________
conv2d_transpose_2 (Conv2DTranspose)            (None, 256, 256, 32)             8224              activation_10[0][0]                              
____________________________________________________________________________________________________________________________________________________
concatenate_2 (Concatenate)                     (None, 256, 256, 64)             0                 conv2d_transpose_2[0][0]                         
                                                                                                   activation_4[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_11 (Conv2D)                              (None, 256, 256, 32)             18464             concatenate_2[0][0]                              
____________________________________________________________________________________________________________________________________________________
activation_11 (Activation)                      (None, 256, 256, 32)             0                 conv2d_11[0][0]                                  
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_6 (SpatialDropout2D)          (None, 256, 256, 32)             0                 activation_11[0][0]                              
____________________________________________________________________________________________________________________________________________________
conv2d_12 (Conv2D)                              (None, 256, 256, 32)             9248              spatial_dropout2d_6[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_12 (Activation)                      (None, 256, 256, 32)             0                 conv2d_12[0][0]                                  
____________________________________________________________________________________________________________________________________________________
conv2d_transpose_3 (Conv2DTranspose)            (None, 512, 512, 16)             2064              activation_12[0][0]                              
____________________________________________________________________________________________________________________________________________________
concatenate_3 (Concatenate)                     (None, 512, 512, 32)             0                 conv2d_transpose_3[0][0]                         
                                                                                                   activation_2[0][0]                               
____________________________________________________________________________________________________________________________________________________
conv2d_13 (Conv2D)                              (None, 512, 512, 16)             4624              concatenate_3[0][0]                              
____________________________________________________________________________________________________________________________________________________
activation_13 (Activation)                      (None, 512, 512, 16)             0                 conv2d_13[0][0]                                  
____________________________________________________________________________________________________________________________________________________
spatial_dropout2d_7 (SpatialDropout2D)          (None, 512, 512, 16)             0                 activation_13[0][0]                              
____________________________________________________________________________________________________________________________________________________
conv2d_14 (Conv2D)                              (None, 512, 512, 16)             2320              spatial_dropout2d_7[0][0]                        
____________________________________________________________________________________________________________________________________________________
activation_14 (Activation)                      (None, 512, 512, 16)             0                 conv2d_14[0][0]                                  
____________________________________________________________________________________________________________________________________________________
conv2d_15 (Conv2D)                              (None, 512, 512, 2)              34                activation_14[0][0]                              
====================================================================================================================================================
Total params: 481,762
Trainable params: 481,762
Non-trainable params: 0
____________________________________________________________________________________________________________________________________________________

The compiled model:

model <- model %>%
      compile(loss = dice_coef_loss_bce_2Classes, #dice_coef_loss_bce,  
              optimizer = "adam", 
              metrics = custom_metric("dice_coef_loss_for_bce", dice_coef_loss_for_bce)
      )

With the corresponding metric:

dice_coef <- function(y_true, y_pred, smooth = 1.0) {
  y_true_f <- k_flatten(y_true)
  y_pred_f <- k_flatten(y_pred)
  intersection <- k_sum(y_true_f * y_pred_f)
  k_mean((2 * intersection + smooth) / (k_sum(y_true_f) + k_sum(y_pred_f) + smooth))
}
attr(dice_coef, "py_function_name") <- "dice_coef"

dice_coef_loss_for_bce <- function(y_true, y_pred){
  1 - dice_coef(y_true, y_pred)
}
attr(dice_coef_loss_for_bce, "py_function_name") <- "dice_coef_loss_for_bce"

dice_coef_loss_bce_2Classes <- function(y_true, y_pred, l_b_c = L_B_C, w_class_1 = W_CLASS_1, w_class_2 = W_CLASS_2){
  k_binary_crossentropy(y_true, y_pred) * l_b_c + 
    dice_coef_loss_for_bce(y_true[,,,1], y_pred[,,,1]) * w_class_1 +
    dice_coef_loss_for_bce(y_true[,,,2], y_pred[,,,2]) * w_class_2 
}
attr(dice_coef_loss_bce_2Classes, "py_function_name") <- "dice_coef_loss_bce_2Classes"

My input data looks as following:

> str(X)
 num [1:48, 1:512, 1:512, 1] 0.0839 0.0888 0.3018 0.2063 0.0786 ...
> str(Y)
 num [1:48, 1:512, 1:512, 1:2] 1 0 0 1 0 1 1 1 1 1 ...

I am willing to use a 80% 20% split for my training and validation sets respectively.

Below is my attempt to augment the data:

VALIDATION_SPLIT = 0.2
BATCH_SIZE = 32
EPOCHS = 30
SAMPLE_SIZE = 48
param_aug <- list(rotation_range = 4,
                  horizontal_flip = TRUE,
                  vertical_flip = TRUE,
                  validation_split = VALIDATION_SPLIT)

image_generator <- image_data_generator(param_aug)
image_generator %>% fit_image_data_generator(X, augment = T)
early_stopping <- callback_early_stopping(patience = 8)

history <- model %>%
  fit_generator(generator = flow_images_from_data(X, 
                                                  Y,
                                                  image_generator,
                                                  batch_size = BATCH_SIZE,
                                                  shuffle = T,
                                                  subset = "training"),
                steps_per_epoch = as.integer(round(SAMPLE_SIZE / BATCH_SIZE, 0)),
                epochs = EPOCHS,
                validation_data = flow_images_from_data(X, 
                                                        Y,
                                                        image_generator,
                                                        batch_size = BATCH_SIZE,
                                                        shuffle = T,
                                                        subset = "validation"),
                validation_steps = as.integer(round(SAMPLE_SIZE / BATCH_SIZE, 0)),
                verbose = 1,
                callbacks = list(early_stopping)
                )

However when I run this I get this error message (with traceback):

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: Training and validation subsets have different number of classes after the split. If your numpy arrays are sorted by the label, you might want to shuffle them. 
15.
stop(structure(list(message = "ValueError: Training and validation subsets have different number of classes after the split. If your numpy arrays are sorted by the label, you might want to shuffle them.", 
    call = py_call_impl(callable, dots$args, dots$keywords), 
    cppstack = structure(list(file = "", line = -1L, stack = c("/home/gendarme/R/x86_64-pc-linux-gnu-library/3.5/reticulate/libs/reticulate.so(Rcpp::exception::exception(char const*, bool)+0x7a) [0x7f62ace911ba]", 
    "/home/gendarme/R/x86_64-pc-linux-gnu-library/3.5/reticulate/libs/reticulate.so(Rcpp::stop(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x27) [0x7f62ace91317]",  ... 
14.
__init__ at image.py#1611
13.
flow at image.py#916
12.
(structure(function (...) 
{
    dots <- py_resolve_dots(list(...))
    result <- py_call_impl(callable, dots$args, dots$keywords) ... 
11.
do.call(generator$flow, args) 
10.
flow_images_from_data(X, Y, image_generator, batch_size = BATCH_SIZE, 
    shuffle = T, subset = "validation") 
9.
fit_generator(., generator = flow_images_from_data(X, Y, image_generator, 
    batch_size = BATCH_SIZE, shuffle = T, subset = "training"), 
    steps_per_epoch = as.integer(round(SAMPLE_SIZE/BATCH_SIZE, 
        0)), epochs = EPOCHS, validation_data = flow_images_from_data(X,  ... 
8.
function_list[[k]](value) 
7.
withVisible(function_list[[k]](value)) 
6.
freduce(value, `_function_list`) 
5.
`_fseq`(`_lhs`) 
4.
eval(quote(`_fseq`(`_lhs`)), env, env) 
3.
eval(quote(`_fseq`(`_lhs`)), env, env) 
2.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 
1.
model %>% fit_generator(generator = flow_images_from_data(X, 
    Y, image_generator, batch_size = BATCH_SIZE, shuffle = T, 
    subset = "training"), steps_per_epoch = as.integer(round(SAMPLE_SIZE/BATCH_SIZE, 
    0)), epochs = EPOCHS, validation_data = flow_images_from_data(X,  ... 

I do not know how to properly structured the data so that it would work. I have seen other people doing this in python but did not manage to replicate this in R.

My biggest concern and also the main problem is the following: I would like to augment X and apply the same spatial transformation to Y (like rotations, flips or crops) but no intensity / texture based transformation to Y (though I would like to apply this to X). This is also why in my example above I used only geometry based transformations.

Does someone know how to solve this problem?

Many thanks in advance.

Cheers,

Mathieu

skeydan commented 5 years ago

Hi,

can you please provide the structure and content (by example) of X and Y?

Without seeing the model definition and the data, what I'd conclude from the error message is that given the amount of data is so small (48 if I'm correct?), and you do a validation split of 4:1, that sometimes the validation split does not have all (2 I guess?) classes.

By the way I don't see how the StackOverflow link addresses this problem? There, the setup is pretty different as they are streaming files from directories, not doing a random split on in-memory data.

Perhaps you could switch to the streaming method too? Then it would be pretty clear what's in the validation set.

mgendarme commented 5 years ago

Hi Sigrid, Thanks for the reply. I updated the post with many more details to give more context. If example images are required I can post that as well.

In the python example they did stream data from directory instead of the memory. I may be naively assumed that the difference between flow_images_from_directory and flow_images_from_data was just the location from which the data are fetched (disk VS memory).

Don't I generate new images on the fly while usingflow_images_from_data(X, Y, image_generator, ...)?

Many thanks for your help.

Cheers

skeydan commented 5 years ago

Thanks for the clarifications. Now that you say it's mainly about the data augmentation, I think we should not investigate that specific error message first (which originally lead me to say that probably you could avoid this using the flow-from-directory approach used in the SO example), but concentrate on the data augmentation...

I think that probably, the way you pass in the data the targets are not transformed (I think).

In the Keras docs, there is an example of data augmentation for segmentation - search for "Example of transforming images and masks together.".

Actually I'm copying it here for further reference:

# we create two instances with the same arguments
data_gen_args = dict(featurewise_center=True,
                     featurewise_std_normalization=True,
                     rotation_range=90,
                     width_shift_range=0.1,
                     height_shift_range=0.1,
                     zoom_range=0.2)
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
seed = 1
image_datagen.fit(images, augment=True, seed=seed)
mask_datagen.fit(masks, augment=True, seed=seed)

image_generator = image_datagen.flow_from_directory(
    'data/images',
    class_mode=None,
    seed=seed)

mask_generator = mask_datagen.flow_from_directory(
    'data/masks',
    class_mode=None,
    seed=seed)

# combine generators into one which yields image and masks
train_generator = zip(image_generator, mask_generator)

model.fit_generator(
    train_generator,
    steps_per_epoch=2000,
    epochs=50)

I suggest porting this to R and seeing if that works correctly.

mgendarme commented 5 years ago

Thanks a lot Sigrid! I think I had seen that bit of information at some point. I tried it and here is how I think it should look like:

# we create two instances with the same arguments
# 1. Define your data augmentation
data_gen_args = list(rotation_range = 4,
                     horizontal_flip = TRUE,
                     vertical_flip = TRUE)

image_datagen = image_data_generator(data_gen_args)
mask_datagen = image_data_generator(data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
# 2. Fit the augmentation
seed = 1
image_datagen %>% fit_image_data_generator(X, 
                                           augment = T, 
                                           seed = seed) # seed seems to be the problem
mask_datagen %>% fit_image_data_generator(Y,
                                          augment = T,
                                          seed = seed)  # seed seems to be the problem

# 3. Setup your generator using flow_from_directory()
image_generator = flow_images_from_data(X, 
                                        image_datagen,
                                        seed = seed)

mask_generator = flow_images_from_data(Y,
                                       mask_datagen,
                                       seed = seed)

# combine generators into one which yields image and masks
train_generator = list(image_generator, mask_generator)

# 4. Train your model with fit_generator()
early_stopping <- callback_early_stopping(patience = 8)
model %>% fit_generator(train_generator,
                        steps_per_epoch = 16,
                        epochs = 32,
                        verbose = 1,
                        callbacks = list(early_stopping))

Here are the issues I am running into after running:

image_datagen %>% fit_image_data_generator(X, 
                                           augment = T, 
                                           seed = seed)
 Rerun with Debug
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypeError: Cannot cast array from dtype('float64') to dtype('int64') according to the rule 'safe' 

I did search for that specific issue but could not come up with a solution.

As a side not I did find this as well and am starting to investigate it deeper. Could be another solution for the same problem. Though if this is the case it is a lot of work for a "simple" dual Image / Mask tandem augmentation task.

Cheers

skeydan commented 5 years ago

Can you change seed <- 1 to seed <- 1L?

This snippet works for me:

seed = 1L

X <- k_random_uniform(shape = c(10, 256, 256, 3)) %>% k_eval()
image_datagen %>% fit_image_data_generator(
  X,
  augment = T,
  seed = seed
)
mgendarme commented 5 years ago

Thanks Sigrid that solved the first issue. Now it gets stuck when I run:

image_generator = flow_images_from_data(X, 
                                        image_datagen,
                                        seed = seed)

Providing me with:

Error in dim(x) <- length(x) : invalid first argument

Though the dimensions of X did not change from fit_image_data_generator to flow_image_from_data.

The structure of X at this point looks like:

> str(X)
num [1:48, 1:512, 1:512, 1] 0.0839 0.0888 0.3018 0.2063 0.0786 ...

Cheers

skeydan commented 5 years ago

Do you think you could post a complete chunk of code I could run to reproduce? Ideally I could just run that with random data like I did in my last comment? Btw does it fail in the same way if you use random data like I did?

mgendarme commented 5 years ago

I managed to solve the problem, it was pretty easy: This:

image_generator = flow_images_from_data(X, 
                                        image_datagen,
                                        seed = seed)

should be turned into this:

image_generator = flow_images_from_data(X, 
                                        generator = image_datagen,
                                        seed = seed)

The aim now is to feed the fit_generator function. This is what I have at the moment:

data_gen_args = list(rotation_range = 4,
                     horizontal_flip = TRUE,
                     vertical_flip = TRUE)

image_datagen = image_data_generator(data_gen_args)
mask_datagen = image_data_generator(data_gen_args)

# Provide the same seed and keyword arguments to the fit and flow methods
# 2. Fit the augmentation
seed = 1L
image_datagen %>% fit_image_data_generator(X, 
                                           augment = T, 
                                           seed = seed)
mask_datagen %>% fit_image_data_generator(Y,
                                          augment = T,
                                          seed = seed)

# 3 Setup generators using flow_from_directory() and combine them into one which yields image and masks

iter <- 0

train_generator = function(X,
                           Y,
                           total_iter){
  Xaug <- X
  Yaug <- Y
  function(){
    iter <<- iter + 1
    if (iter <= total_iter) {
      Xaug = flow_images_from_data(Xaug,
                                generator = image_datagen,
                                seed = seed)

      Yaug = flow_images_from_data(Yaug,
                                generator = mask_datagen,
                                seed = seed)

      result <- list(Xaug,
                     Yaug)

      return(result)
    } else
      NULL
  }
}

# for testing I set total_iter to 2 
train_iterator = py_iterator(train_generator(X = X, Y = Y, total_iter = 2))

# 4. Train your model with fit_generator()
early_stopping <- callback_early_stopping(patience = 8)
model %>% fit_generator(train_iterator,
                        steps_per_epoch = 16,
                        epochs = 32,
                        verbose = 1,
                        callbacks = list(early_stopping))

My problem is at step 3 I guess. I do get that I have to provide an iterator function that should provide a list containing (inputs, targets). However when I feed my model with the train_iterator I have the feeling it is computing endlessly (or I do not get an error message telling that an something went wrong) despite the rule applied to the train_generator. I am obviously doing something wrong at step 3 but do not know what exactly. Any idea where the problem is suppose to be? Cheers

skeydan commented 5 years ago

To me this looks like you have a generator (flow_images_from_data) inside a generator (train_generator). How about trying the simpler approach from the Python documentation?

mgendarme commented 5 years ago

Hi Sigrid, Actually I am planning two steps ahead as I want to use a custom made image_generator that would not be limited by the features implemented right now (and also apply intensity based transformation to the image but not to the mask). I have seen that the problem I mentioned earlier was already reported somewhere else but not answered. I do get the same "freeze" as the person who wrote the post on stackoverflow. I also get this problem when trying to run this code with the original data from the kaggle chalenge. In case it helps here are my session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2   doMC_1.3.5       iterators_1.0.10 foreach_1.4.4    reticulate_1.10  EBImage_4.24.0   forcats_0.3.0    stringr_1.3.1    dplyr_0.7.7     
[10] purrr_0.2.5      readr_1.1.1      tidyr_0.8.2      tibble_1.4.2     ggplot2_3.1.0    tidyverse_1.2.1  keras_2.2.0.9001

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0          locfit_1.5-9.1      lubridate_1.7.4     lattice_0.20-38     fftwtools_0.9-8     png_0.1-7           assertthat_0.2.0   
 [8] zeallot_0.1.0       digest_0.6.18       R6_2.3.0            tiff_0.1-5          cellranger_1.1.0    plyr_1.8.4          backports_1.1.2    
[15] httr_1.3.1          pillar_1.3.0        tfruns_1.4          rlang_0.3.0.1       lazyeval_0.2.1      readxl_1.1.0        rstudioapi_0.8     
[22] whisker_0.3-2       Matrix_1.2-15       htmlwidgets_1.3     RCurl_1.95-4.11     munsell_0.5.0       broom_0.5.0         compiler_3.5.1     
[29] modelr_0.1.2        pkgconfig_2.0.2     BiocGenerics_0.28.0 base64enc_0.1-3     tensorflow_1.10     htmltools_0.3.6     tidyselect_0.2.5   
[36] codetools_0.2-15    crayon_1.3.4        withr_2.1.2         bitops_1.0-6        grid_3.5.1          nlme_3.1-137        jsonlite_1.6       
[43] gtable_0.2.0        magrittr_1.5        scales_1.0.0        cli_1.0.1           stringi_1.2.4       xml2_1.2.0          generics_0.0.1     
[50] tools_3.5.1         glue_1.3.0          hms_0.4.2           jpeg_0.1-8          abind_1.4-5         yaml_2.2.0          colorspace_1.3-2   
[57] rvest_0.3.2         bindr_0.1.1         haven_1.1.2 

Any idea where this could be coming from?

Cheers

skeydan commented 5 years ago

hi,

let's do this step by step then.

Regarding the code from SO, try removing the py_iterator which is not need in current versions.

E.g. (based on that code)

t <- mikes.custom.generator.function()
t

play.network %>% fit_generator(             
  t,
  steps_per_epoch = 1,
  epochs = 2
)

Btw here

https://blogs.rstudio.com/tensorflow/posts/2018-11-05-naming-locating-objects/

are several examples of R generators (simple ones).

mgendarme commented 5 years ago

Hi Sigrid,

Thanks a lot for your help, removing the py_iterator did solve the problem. What I have now looks like this:

custom_generator <- function(data, # tibble containing encoded X and Y
                             shuffle,
                             batch_size) {
 i <- 1
  function() {
    if(shuffle) {
      indices <- sample(1:nrow(data), size = batch_size)
    } else { 
      if (i + batch_size >= nrow(data) ) 
        i <<- 1
      indices <- c(i:min(i + batch_size - 1, nrow(data)))
      i <<- i + length(indices)
    }
    custom_augmentation(data) # perform geometry based operation on X and Y and intensity based operations on X only
    list(X,Y)
    }
}

train_generator <- custom_generator(data = train_input,
                                    shuffle = TRUE,
                                    batch_size = BATCH_SIZE)

val_generator <- custom_generator(data = val_input,
                                  shuffle = FALSE,
                                  batch_size = BATCH_SIZE)

early_stopping <- callback_early_stopping(patience = 8)

history <- model %>% 
  fit_generator(
    generator = train_generator,
    epochs = 100L, 
    steps_per_epoch = as.integer(nrow(train_input) / BATCH_SIZE), 
    validation_data = val_generator,
    validation_steps = as.integer(nrow(val_input) / BATCH_SIZE),
    verbose = 1L,
    callbacks = list(early_stopping)
    )

I still have two questions regarding this: 1/ Why do I have to say shuffle = FALSE for the val_generator? When I tried with some test parameters and shuffle = TRUE it broke and retrieved this error:

> history <- model %>% 
+   fit_generator(
+     generator = train_generator,
+     epochs = 100L, 
+     steps_per_epoch = 2L, #as.integer(nrow(train_input) / BATCH_SIZE), 
+     validation_data = val_generator,
+     validation_steps = 1L, #as.integer(nrow(val_input) / BATCH_SIZE),
+     verbose = 1L,
+     callbacks = list(early_stopping)
+     )
Epoch 1/100
1/2 [==============>...............] - ETA: 28s - loss: 0.7782 - dice_coef_loss_for_bce: 0.6753Error occurred in generator: cannot take a sample larger than the population when 'replace = FALSE'
 Show Traceback

 Rerun with Debug
 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  StopIteration:

2/ In the example unet_linux a py_iterator was used on top of their custom generator.

train_iterator <- py_iterator(train_generator(images_dir = images_dir,
                                              masks_dir = masks_dir,
                                              samples_index = train_index,
                                              batch_size = batch_size))

Why is that if the parameters of fit_generator controls how many epochs should be run? My assumption was that the generator performs the operations and that py_iterator was enabling iterating again the generator. That's why I started with a combination of both, obviously i was wrong.

As a point 3/ I wanted to thank you very much for the link to the R blog it is very interesting! I am very much looking forward to your future articles!

Cheers

skeydan commented 5 years ago

That's nice to hear, thank you!

Reg. 1), given the text of the error message it's very hard to imagine it could be anything different from batch_size being bigger than nrow(data) - for whatever reasons... Or indices not being used, later, by some copy-paste thing? I'd try putting print() statements everywhere, see what indices is etc.

Reg. 2), I think that might be because the example was created before that py_iterator stuff was taken care of internally.

mgendarme commented 5 years ago

Hi Sigrid, thank you for the detail.

I was trying over the past two days to run a pretty heavy training step (virtual size of the augmented data set 128000 images). The problem is that my RStudio is getting unstable and crashes at random times, making it impossible to complete the training step. My hyperparameters for training the model are the following:

BATCH_SIZE = 16L                  
EPOCHS = 100L
STEPS_PER_EPOCHS = 80L
VALIDATION_STEPS = as.integer(STEPS_PER_EPOCHS / 5L)

Training the model encompasses this:

train_generator <- custom_generator(data = train_input,
                                    shuffle = TRUE,
                                    batch_size = BATCH_SIZE)

val_generator <- custom_generator(data = val_input,
                                  shuffle = FALSE,
                                  batch_size = BATCH_SIZE)

early_stopping <- callback_early_stopping(patience = 8L)

history <- model %>% 
  fit_generator(
    generator = train_generator,
    epochs = EPOCHS, 
    steps_per_epoch = STEPS_PER_EPOCHS, #as.integer(nrow(train_input) / BATCH_SIZE), 
    validation_data = val_generator,
    validation_steps = VALIDATION_STEPS, #as.integer(nrow(val_input) / (BATCH_SIZE / 2)),
    verbose = 1L,
    callbacks = list(early_stopping)
    )

Any idea where this could be coming from?

Cheers

skeydan commented 5 years ago

Try batch_size = 1?

mgendarme commented 5 years ago

Weirdly enough if I use those parameters (for a quick testing):

BATCH_SIZE = 16L                  
EPOCHS = 30L
STEPS_PER_EPOCHS = 2L
VALIDATION_STEPS = 1L

it does run through without issues.

Also for all the training steps, I had ran before without data augmentation, were done with:

BATCH_SIZE = 32L                  
EPOCHS = 60L # up to 120
VALIDATION_PLIT = 0.2

and back then I had no problems.

If I would decrease batch_size to 1 wouldn't I be "under feeding" my model at this point?

Cheers

skeydan commented 5 years ago

I think you need to experiment and find the best trade-off between speed and memory usage on your system (for this task)

mgendarme commented 5 years ago

Thanks a lot Sigrid, you confirmed my thoughts. I will try to just do that.