flow from dataframe with multiple column names fed into y_col generates TYPE ERROR

drodriguez3 commented 4 years ago

I am using flow from data frame for a multi-label classification problem with 14 possible labels, all column names are placed in a list in string format for example:

columns = ["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity", "Lung Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis", "Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture", "Support Devices"]

The list name (columns) is then fed into y_col for example:

train_generator=datagen.flow_from_dataframe( dataframe=df[:178731], directory='/home/admin1/Downloads/', x_col='Path', y_col=columns, batch_size=batch_size, seed=42, shuffle=True, target_size=(224, 224))

I'm getting this error:

TypeError: If class_mode="categorical", y_col="['No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices']" column values must be type string, list or tuple.

I have already tried to solution previously proposed but the error continues:

df['No Finding'] = df['No Finding'].astype(str) df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str) df['Cardiomegaly'] = df['Cardiomegaly'].astype(str) df['Lung Opacity'] = df['Lung Opacity'].astype(str) df['Lung Lesion'] = df['Lung Lesion'].astype(str) df['Edema'] = df['Edema'].astype(str) df['Consolidation'] = df['Consolidation'].astype(str) df['Pneumonia'] = df['Pneumonia'].astype(str) df['Atelectasis'] = df['Atelectasis'].astype(str) df['Pneumothorax'] = df['Pneumothorax'].astype(str) df['Pleural Effusion'] = df['Pleural Effusion'].astype(str) df['Pleural Other'] = df['Pleural Other'].astype(str) df['Fracture'] = df['Fracture'].astype(str) df['Support Devices'] = df['Support Devices'].astype(str)

It only works when I'm feeding a single column name to y_col. I'm using keras 2.2.4 and I have already uninstalled keras.preprocessing and installed the github version. It seems that the flow from directory function does not support multiple column names being fed to y_col in list format using the default class mode as categorical since this is a muti-label classification problem. I suspect that the type issue stems from pandas dataframes values only being converted to objects and the keras preprocessing dataframe iterator code only allows string, list or tuple but pandas does not directly convert to string only to object. Below is my code:

df=pd.read_csv('/home/admin1/Downloads/CheXpert-v1.0/train.csv')

df = df.replace(np.nan, 0) df['No Finding'].head()

df['No Finding'] = df['No Finding'].astype(str) df['Enlarged Cardiomediastinum'] = df['Enlarged Cardiomediastinum'].astype(str) df['Cardiomegaly'] = df['Cardiomegaly'].astype(str) df['Lung Opacity'] = df['Lung Opacity'].astype(str) df['Lung Lesion'] = df['Lung Lesion'].astype(str) df['Edema'] = df['Edema'].astype(str) df['Consolidation'] = df['Consolidation'].astype(str) df['Pneumonia'] = df['Pneumonia'].astype(str) df['Atelectasis'] = df['Atelectasis'].astype(str) df['Pneumothorax'] = df['Pneumothorax'].astype(str) df['Pleural Effusion'] = df['Pleural Effusion'].astype(str) df['Pleural Other'] = df['Pleural Other'].astype(str) df['Fracture'] = df['Fracture'].astype(str) df['Support Devices'] = df['Support Devices'].astype(str) df['Age'] = df['Age'].astype(str)

df.dtypes

columns=["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity", "Lung Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis", "Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture", "Support Devices"]

datagen=ImageDataGenerator(rescale=1./255.) test_datagen=ImageDataGenerator(rescale=1./255.)

train_generator=datagen.flow_from_dataframe( dataframe=df[:178731], directory='/home/admin1/Downloads/', x_col='Path', y_col=columns, batch_size=batch_size, seed=42, shuffle=True, target_size=(224, 224))

drodriguez3 commented 4 years ago

I have added object to the type list in the preprocessing file but now I get a KEY ERROR: KeyError: ['No Finding', 'Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture', 'Support Devices']

This seems to be coming from the _filter_classes function that is attempting to dropna but I have already done fillna(0) so not sure why this is happening

rragundez commented 4 years ago

please add a minimal reproducible example of the problem. This means that I can copy paste the code run it and see the issue you mentioned.

What i the output of df[columns].dtypes?

jaypanc commented 2 years ago

same error. I changed all my columns to string type but it shows same error

HanClinto commented 2 years ago

I know this is an old question on an inactive board, but given that this still shows up in Google results, I figured this is the best place to put the answer I found when struggling with the same problem.

The problem is that Keras is expecting a list or tuple type, not an ndarray. So we need to make a new column that is the correct type:

columns = ["No Finding", "Enlarged Cardiomediastinum", "Cardiomegaly", "Lung Opacity", "Lung Lesion","Edema", "Consolidation", "Pneumonia", "Atelectasis", "Pneumothorax", "Pleural Effusion", "Pleural Other", "Fracture", "Support Devices"]

dataframe["CombinedColumns"] = dataframe[classes].apply(lambda x: x.tolist(), axis=1)

train_generator=datagen.flow_from_dataframe(
dataframe=df[:178731],
directory='/home/admin1/Downloads/',
x_col='Path',
y_col="CombinedColumns",
batch_size=batch_size,
seed=42,
shuffle=True,
target_size=(224, 224))

Once we do this, it should get past the error.

keras-team / keras-preprocessing

flow from dataframe with multiple column names fed into y_col generates TYPE ERROR #266