keras-team / keras-preprocessing

Utilities for working with image data, text data, and sequence data.
Other
1.02k stars 444 forks source link

Unable to use flow_from_dataframe - y_col must be str,list,tuple #286

Open obiii opened 4 years ago

obiii commented 4 years ago

Hi,

i am trying to train a multi task CNN using flow_from_dataframe. The columns in dataframe are already in str format but the dtypes shows "Object" no matter what I use to convert them to string. Seems pandas uses object even for str now.

The dataframe has these columns:

Image PFRType FuelType image1.jpg 1-3 NG

Image object PFRType object FuelType object dir object dtype: object

And I get this error: If class_mode="sparse", y_col="['PFRType', 'FuelType']" column values must be strings.

here is the code for generator

trainGen = ImageDataGenerator()
trainGenDf = trainGen.flow_from_dataframe(trainLabel,
                                         directory = '../MTLData/train/',
                                         x_col = "Image",y_col=['PFRType', 'FuelType'],
                                         class_mode='sparse',
                                         target_size=(224,224),
                                         batch_size=32)

I am using Keras Version: 2.3.1 Can someone please help?

HanClinto commented 2 years ago

I know this is a very old question on a defunct message board, but given that this still shows up in search results (and I was having a similar issue), the solution I found was to first turn my multiple columns in a new column in my dataframe that is a list or tuple itself.

dataframe['combined_classes'] = dataframe[('PFRType', 'FuelType')].apply(lambda x: x.tolist(), axis=1)
trainGen = ImageDataGenerator()
trainGenDf = trainGen.flow_from_dataframe(dataframe,
                                         directory = '../MTLData/train/',
                                         x_col = "Image",
                                         y_col='combined_classes',
                                         class_mode='sparse',
                                         target_size=(224,224),
                                         batch_size=32)

I'm sure you're not still working on this, but wanted to share my workaround anyways in case anyone else was looking for the answer like I was.