Closed mosheliv closed 5 years ago
forgot to mention, tf v1.10, keras 2.2.4
You are running out of GPU memory. There's no direct fix, other than to : Select a smaller model Select a smaller batch size Select a smaller image input size Buy a GPU with larger amounts of memory
Thank you for your reply. The card is a Titan X with 12GB. It should have run very well on it. The batch size is now 2. I have run Densenet201 on it with larger batch sizes so I think there is a problem here with the implementation.
That is another issue. ResNeXt is not optimized on Keras and Tensorflow since there are no group convolution ops.
for what I understand this should only slow it down, not consume more memory on the gpu.
Due to the fact that group conv doesn't exist in TF, I had to create and then slice multiple convs each with their own batch norm layer. This in turn does cost more memory, albeit not on a scale that it exceeds GPU memory. There must be some other issue.
I suggest using another model.
Again, thank you for your patience. I have created a super simple dummy program. it seems that the maximum I can get to with any of your models is 128x128 (batch size 1). I still think there is something wrong here but i will not pursue it farther. here is the code for the mini program with 256x256 (that fails):
import numpy as np
import pandas as pd
import keras
from keras.preprocessing.image import ImageDataGenerator
import senets.se_resnext
from keras.models import Sequential, Model, load_model
from keras.layers import Activation, Dense, Multiply, Input
from keras.callbacks import ModelCheckpoint
from keras import metrics
from keras.optimizers import Adam
from keras import backend as K
import warnings
warnings.filterwarnings("ignore")
class DataGenerator:
def __init__(self):
self.image_generator = ImageDataGenerator(rescale=1. / 255,
vertical_flip=True,
horizontal_flip=True,
rotation_range=180,
fill_mode='reflect')
def create_train(self, dataset_info, batch_size, shape, augument=True):
assert shape[2] == 3
while True:
random_indexes = np.random.choice(len(dataset_info), batch_size)
batch_images1 = np.empty((batch_size, shape[0], shape[1], shape[2]))
batch_labels = np.zeros((batch_size, 28))
for i, idx in enumerate(random_indexes):
image1= self.load_image(
dataset_info[idx]['path'], shape)
batch_images1[i] = image1
batch_labels[i][dataset_info[idx]['labels']] = 1
yield batch_images1, batch_labels
def load_image(self, path, shape):
image1 = np.stack((
np.ones((256,256)),
np.ones((256,256)),
np.ones((256,256)),
), -1)
return image1.astype(np.float)
train_datagen = DataGenerator()
train_dataset_info = []
for i in range(0, 1000):
train_dataset_info.append({
'path':str(i),
'labels':np.array([5])})
train_dataset_info = np.array(train_dataset_info)
valid_dataset_info = []
for i in range(1000, 1200):
valid_dataset_info.append({
'path':str(i),
'labels':np.array([6])})
valid_dataset_info = np.array(valid_dataset_info)
print(train_dataset_info.shape, valid_dataset_info.shape)
def create_model(input_shape, n_out):
inp_mask = Input(shape=input_shape)
pretrain_model_mask = senets.se_resnext.SEResNext( input_shape = (256,256,3),
include_top=False,
weights=None,
pooling='max')
pretrain_model_mask.name='seresnext50_mask'
x = pretrain_model_mask(inp_mask)
out = Dense(n_out, activation='sigmoid')(x)
model = Model(inputs=inp_mask, outputs=[out])
return model
keras.backend.clear_session()
model = create_model(
input_shape=(256,256,3),
n_out=28)
model.compile(
loss='binary_crossentropy',
optimizer='adam',
metrics=['acc'])
model.summary()
epochs = 10 ;batch_size = 1
checkpointer = ModelCheckpoint(
'./SEResNext50_{epoch:02d}-{val_loss:.4f}.model',
verbose=2,
save_best_only=False)
# create train and valid datagens
train_generator = train_datagen.create_train(
train_dataset_info, batch_size, (256,256,3))
validation_generator = train_datagen.create_train(
valid_dataset_info, batch_size, (256,256,3))
K.set_value(model.optimizer.lr, 0.0001)
# train model
history = model.fit_generator(
train_generator,
steps_per_epoch=1000,
validation_data=validation_generator,
validation_steps=20,
epochs=epochs,
verbose=1,
callbacks=[checkpointer])
Hmm the script at first glance seems to be proper. It is most probably due to an inefficient implementation of the ResNext model.
Could I ask you to replace the ResNext model with the seResnet model and try if you can manage larger batch sizes ?
yes, it is much much better. i can do 512x512 with batch size of 12 easily. So, according to my limited knowledge, this is 16 12 of the resnext sizebatch (128 * 128, 1). This is curious as I don't think slightly unoptimized net can be that much worse. Anyways, up to you if you want to look into it. I will give the seresnet model a go and see how much better it performs compared to the regular resnet
Thank you again!
Interesting. I'll take a look into it, but I don't want to continue work on ResNeXt until there is official support for grouped Convolutions in the future, as hacly solutions aren't to my liking.
I can truly sympathize, but your github repository is actually really popular because it is practically the only one. Anyways, thank you very much for taking the time to answer my questions. Se-resnet seems to train very well even without the pre trained weights.
I get this error while using SEResNext as a drop in replacement to Keras resnet50:
while this is a pretty big input compared to yours (512x512 compared to the 32x32 default at imagenet) it has no problem with the keras implementation of resnet. it looks like there is an extra 512 sized dimension to the vector which cause it to eat all memory
any help will be greatly appreciated!
Regards, Moshe