rstudio / keras3

R Interface to Keras
https://keras3.posit.co/
Other
838 stars 283 forks source link

Error calling flow_images_from_dataframe with multiple y columns #761

Closed eafpres closed 5 years ago

eafpres commented 5 years ago

I am attempting to use flow_images_from_dataframe with binary vector labels. Here is the data: str(image_subset) 'data.frame': 862 obs. of 11 variables: $ file_paths: chr "C:/EAF LLC/aa-Analytics and BI/.../sentences_filtered/a01/a01-000u/a01-000u-s00-00.png" ... $ writer_0 : chr "1" "1" "1" "1" ... $ writer_1 : chr "0" "0" "0" "0" ... $ writer_2 : chr "0" "0" "0" "0" ... $ writer_3 : chr "0" "0" "0" "0" ... $ writer_4 : chr "0" "0" "0" "0" ... $ writer_5 : chr "0" "0" "0" "0" ... $ writer_6 : chr "0" "0" "0" "0" ... $ writer_7 : chr "0" "0" "0" "0" ... $ writer_8 : chr "0" "0" "0" "0" ... $ writer_9 : chr "0" "0" "0" "0" ...

The call is: image_gen <- image_data_generator() data_gen <- flow_images_from_dataframe(image_subset, x_col = "file_paths", y_col = y_cols, generator = image_gen, target_size = c(target_x, target_y), color_mode = "grayscale", class_mode = "binary", batch_size = batch_size) y_cols is the list of the label columns: y_cols [1] "writer_0" "writer_1" "writer_2" "writer_3" "writer_4" "writer_5" "writer_6" "writer_7" [9] "writer_8" "writer_9"

Originally the 1s & 0s were integer. I was getting this error: Error in py_call_impl(callable, dots$args, dots$keywords) : TypeError: If class_mode="binary", y_col="['writer_0', 'writer_1', 'writer_2', 'writer_3', 'writer_4', 'writer_5', 'writer_6', 'writer_7', 'writer_8', 'writer_9']" column values must be strings.

Detailed traceback: File "C:\Users\eafpres\ANACON~1\envs\KERAS-~1\lib\site-packages\keras_preprocessing\image\image_data_generator.py", line 666, in flow_from_dataframe drop_duplicates=drop_duplicates File "C:\Users\eafpres\ANACON~1\envs\KERAS-~1\lib\site-packages\keras_preprocessing\image\dataframe_iterator.py", line 120, in init self._check_params(df, x_col, y_col, classes) File "C:\Users\eafpres\ANACON~1\envs\KERAS-~1\lib\site-packages\keras_preprocessing\image\dataframe_iterator.py", line 172, in _check_params .format(self.class_mode, y_col))

So I changed the types to character, as you can see above, but the error persists.

I read some issues when this was implemented in python, as in the past binary accepted integer labels. There were comments in some issues that the issue was persisting even after converting to character (in python). However, all those issues are closed, and I do not see reference to this issue here.

The call seems to work if there is ONLY one ycol, and it has character labels in it. I don't understand why it won't work in the binary form; actually trying ANY class_mode with ycol pointing to multiple columns generates the same error.

dfalbel commented 5 years ago

You have to set the class_mode to "other" and then it will work with the numeric columns. The default class_mode is "categorical" and expects strings or list columns.

eafpres commented 5 years ago

Thank you, will try it. But I converted to character and it still throws the error; as if when it’s passed to Python it’s not the correct type. Seems like an actual issue?

dfalbel commented 5 years ago

The docs of flow_image_from_dataframe are very confusing, but here is the relevant part:

class_mode: one of "categorical", "binary", "sparse", "input", "other" or None. Default: "categorical". Mode for yielding the targets:

What I understand is that when class_mode is categorical you can pass the name of a string column or of a list-column. When the class_mode is "other" it will concatenate all columns in the y_col names.

eafpres commented 5 years ago

Okay, that’s very subtle. Perhaps the error message could be improved?

Thank you as always for great support.

I’ll confirm I can run it and close if so.

eafpres commented 5 years ago

This worked: data_gen <- flow_images_from_dataframe(image_subset, x_col = "file_paths", y_col = y_cols, generator = image_gen, target_size = c(target_x, target_y), color_mode = "grayscale", class_mode = "other", batch_size = batch_size, shuffle = FALSE)