Closed fahadakhan96 closed 5 years ago
I guess you have a lot of files in directory
('./').
Here's how the flow_from_dataframe
works:
directory
.And as I mentioned in #93, the current flow_from_dataframe
does not support relative paths.
So could you check if the following steps works?:
git clone -b fix_found_0_images_bug https://github.com/smurak/keras-preprocessing.git
import keras
from keras_preprocessing import image
...
train_imggen = image.ImageDataGenerator(...)
directory
to "./MURA-v1.1"
train_loader = train_imggen.flow_from_dataframe(traindf, './MURA-v1.1', ...)
,
OR change "path" to absolute paths and set directory
to None.
train_loader = train_imggen.flow_from_dataframe(traindf, None, ...)
Thanks, @smurak!
Your fix worked! Didn't need Step 4, though.
Please feel free to close this issue.
Thanks, @smurak! Great
It is working now with absolute paths! Step 4. kind is not working on my system.
Here is my code using absolute path: @Vijayabhaskar96 @smurak
train_df = pd.DataFrame(train_img_data)
train_df.columns = ['id', 'label']
test_df = pd.DataFrame(test_img_data)
test_df.columns = ['id', 'label']
print(train_df['id'][0])
print('******************************************************')
datagen = ImageDataGenerator(rescale=1./255)
train_generator = datagen.flow_from_dataframe(train_df, None,
x_col='id',
y_col='label',
has_ext=True,
batch_size=args.batch_size,
seed=42,
shuffle=True,
class_mode="sparse",
target_size=(224,224),
color_mode='rgb',
interpolation='nearest'
)
After running the above code, I got the following error:
/home/yaurehman2/Documents/Newwork/REPLY_ATTACK_FACE_Mod_corr/train/0.jpg
/home/yaurehman2/anaconda3/envs/virtual-tf2/lib/python3.5/site-packages/keras_preprocessing/image.py:2059: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.df[x_col] = self.df[x_col].astype(str)
Traceback (most recent call last):
File "tutorial_pd.py", line 394, in
update: @Vijayabhaskar96 @smurak
After downloading the update by @smurak, my code is working now on absolute paths.
However, I've found one more problem. It cannot deal with duplicate files names. For example, to balance my data , I duplicate some files name in the training set. However, the .flow_from_dataframe only shows me the actual number of files in the training set. Also it trains on actual number of files in the directory and not on the modified number of files.
As an example my training data contain two classes: class 1 with 6000, class 2 with 30,000.
To balance both data sets, I duplicate the class1 5 times, so my new balance training data set has 60,000 samples.
However, .flow_from_dataframe shows that It only found 36000 samples with 2 classes.
my batch size is 32,thus 36000/32 = 1125 whereas, I it should be 60000/32 = 1875
Here is the output:
Live samples are 6000 , attack samples are 30000 The difference is :5 Balanced data samples: 60000 Found 36000 images belonging to 2 classes. Epoch 1/1 224/1125 [====>.........................] - ETA: 1:38 - loss: 0.3755 - acc: 0.8602
@smurak 's fix was temporary,it was fixed and updated, you should be fine if you have installed the latest github version,instead of the pip version,and for the duplicates set drop_duplicates=False.
Thank you for your prompt response and guidance, @Vijayabhaskar96
It's working now!
Here is the updated code for generator:
train_generator = datagen.flow_from_dataframe( dataframe=train_df, directory=None, x_col='id', y_col='label', has_ext=True, batch_size=args.batch_size, seed=42, shuffle=True, class_mode="sparse", target_size=(224,224), color_mode='rgb', interpolation='nearest', drop_duplicates=False )
Here is the output as a result of the above code:
Live samples are 6000 , attack samples are 30000 The difference is :5 Balanced data samples: 60000 Found 60000 images belonging to 2 classes. Found 36000 images belonging to 2 classes. Epoch 1/1 298/1875 [===>..........................] - ETA: 2:48 - loss: 0.3638 - acc: 0.8965
@Dref360 this issue should be closed
I'm working on the MURA dataset by Stanford. I'm trying to load the dataset using Keras's ImageDataGenerator. The data is in the following hierarchy:
The
study1_positive
folder contains the images.ImageDataGenerator.flow_from_directory
cannot be used with this folder structure, therefore I tried using theflow_from_dataframe
method.However, when run, the code keeps on executing and doesn't stop.
Following is the format of the Pandas DataFrame that I'm passing to the
flow_from_directory
method:I've also tried changing the labels to 'abnormal' and 'normal' in place of 1 and 0, respectively.
Below is the code: