Closed ggous closed 2 years ago
Thanks for reporting this: I will investigate. Based on your setting and the size of your dataset you may be OK with using model.fit(X_train, y_train)
only, in the sense that the augmented dataset generated using the two augmentations will not alter the statistic that much compared to omitting them altogether.
At first glance, I couldn't find any memory leaks or parts that could be optimized wrt. memory usage. augment=True
will make a number of copies of the input data depending on the rounds
parameter (default 1). This will probably be the cause of the memory running out, but I don't see a way of circumventing this. You could try calling the fit
on a subset of your training data, e.g. first 1000 elements (or whatever amount you believe is sufficiently well representing your data)
n_elems = 1000
img_data_gen.fit(X_train[:n_elems], augment=True)
Out of interest, you could try seeing the memory consumption before calling the fit()
with:
import os, psutil
...
img_data_gen = ImageDataAugmentor(
featurewise_center=True,
featurewise_std_normalization=True,
augment=augmentations_albu(),
input_augment_mode='image',
seed=123,
validation_split=0.2,
)
mask_data_gen = ImageDataAugmentor(
augment=augmentations_albu(),
input_augment_mode='mask',
seed=123,
validation_split=0.2,
)
process = psutil.Process(os.getpid())
print("Memory usage:", process.memory_info().rss/1e9, "GB") # in gigabytes
img_data_gen.fit(X_train, augment=True)
...
What I'm wondering is if there is already a lot of other stuff in the system memory before calling img_data_gen.fit(X_train, augment=True)
.
Digging deeper, it seems to me that it could be the featurewise_center=True
and especially the featurewise_std_normalization=True
that consume a lion's share of the memory. Please see e.g. https://github.com/numpy/numpy/issues/13199.
As mentioned in my previous reply, I would recommend using a smaller representative subset of your dataset for finding the normalization factors. Nevertheless, I will do some fixes to the code with better garbage handling.
OK! Thanks for the suggestions!
Using n_elements = 1500
works great.
Now, I tried to use featurewise_std_normalization=False
as you said and it works with all data! So, that is the problem!
If I want to avoid the featurewise_center
and featurewise_std_normalization
, can I use for example:
inputs = Input((512, 512, 3))
norm = Lambda(lambda x: x / 255.0) (inputs)
x = Conv2D(filters=3,
kernel_size=(3, 3),
kernel_initializer='he_normal',
padding='same')(norm)[
But if I use this , then the normalization (x / 255) will be applied also to the masks, right? And we don't want this. How can i avoid it?
Thanks!
--- EDIT -----
Memory usage with all data
Memory usage: 7.926837248 GB
My system was clear, nothing else was running! And the memory gradually was climbing up to 100%
If you want to do 1./255.
normalization you can do either:
img_data_gen = ImageDataAugmentor(
rescale=1/255.,
featurewise_center=True,
featurewise_std_normalization=True,
augment=augmentations_albu(),
input_augment_mode='image',
seed=123,
validation_split=0.2,
)
or add the ToFloat
augmentation to your images. I recommend the former: simply leave it out from your mask generator.
Notice however that 1/255.
normalization is not the same as the mean-0-std-1
that featurewise_center
and featurewise_std_normalization
does.
Nice, thanks!
Yes, I know its not the same.
I will just use rescale (False the rest):
img_data_gen = ImageDataAugmentor(
rescale=1/255.,
featurewise_center=False,
featurewise_std_normalization=False,
augment=augmentations_albu(),
input_augment_mode='image',
seed=123,
validation_split=0.2,
)
Thanks for your help! I am closing the issue!
Hello,
I have a memory crash when trying to use the files
X_train: 2.7GB y_train: 898MB
but If I just use keras
model.fit(X_train, y_train)
, it runs fine!!This is the code I am using:
which results in memory crash (I have 32GB of RAM).