keras-team / keras-preprocessing

Utilities for working with image data, text data, and sequence data.
Other
1.02k stars 443 forks source link

flow_from_dataframe() does not seem to follow steps_per_epoch in fit_generator() #199

Open DollarAkshay opened 5 years ago

DollarAkshay commented 5 years ago

Screenshot 1

Screenshot 2019-04-18 at 10 19 00 pm

Screenshot 2

Screenshot 2019-04-18 at 10 18 45 pm

I have about 190451 training examples in train_df and I have set the BATCH_SIZE to 32.

When calling the fit_generator() function I have specified steps_per_epoch=1. But it looks like this is being completely ignored. I have tried other small values, but it seems like the steps_per_epoch is always equal to 5951 (as seen in screenshot 2)

It looks like it is training on the entire training set and not abiding by the steps_per_epoch set by me. What am I doing wrong?

Extra Info :

rragundez commented 5 years ago

Can you make a minimal reproducible example please, thanks!

rragundez commented 5 years ago

I made this example:

import os
import random

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from keras.models import Sequential, Model
from keras.layers import Flatten, Dense, Input, Reshape

from keras_preprocessing.image import ImageDataGenerator

pixel_val = 1
filenames = []
for i in range(20):
    filename = '/tmp/{}.jpg'.format(i)
    plt.imsave(filename, pixel_val * np.random.uniform(size=(3, 3, 3)))
    filenames.append(filename)

df_binary = pd.DataFrame({'filename': filenames}).sample(frac=1).reset_index(drop=True)
classes = ['dog', 'cat'] * 10
df_binary['class'] = classes
df_binary['weights'] = np.random.uniform(size=len(df))
generator = ImageDataGenerator(rescale=1/255).flow_from_dataframe(
    df_binary,
    weight_col='weights',
    batch_size=3,
    target_size=(3, 3),
    shuffle = True,
    class_mode='binary',
)
labels_to_classes = {v:k for k,v in generator.class_indices.items()}

model = Sequential()
model.add(Flatten(input_shape=(3, 3, 3)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit_generator(generator, epochs=2, steps_per_epoch=1, workers=4, use_multiprocessing=True)

but it does work as expected

Found 20 validated image filenames belonging to 2 classes.
Epoch 1/2
3/3 [==============================] - 0s 67ms/step - loss: 0.2997 - acc: 0.7778
Epoch 2/2
3/3 [==============================] - 0s 2ms/step - loss: 0.4515 - acc: 0.5729

Also steps_per_epoch is handled by the keras not the keras preprocessing repo: https://github.com/keras-team/keras/blob/a6c8042121371b5873773ca767f28cdf5689d5e4/keras/engine/training_generator.py#L180

rragundez commented 5 years ago

@DollarAkshay did you check the minimal reproducible example? can I close the issue?