titu1994 / keras-squeeze-excite-network

Implementation of Squeeze and Excitation Networks in Keras
MIT License
400 stars 118 forks source link

OOM error #12

Closed mosheliv closed 5 years ago

mosheliv commented 5 years ago

I get this error while using SEResNext as a drop in replacement to Keras resnet50:

2018-12-13 21:39:05.642530: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at conv_ops.cc:398 : Resource exhausted: OOM when allocating tensor with shape[12,512,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "rnxt50.py", line 262, in <module>
    callbacks=[checkpointer])
  File "/home/m/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/m/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/m/.local/lib/python3.5/site-packages/keras/engine/training_generator.py", line 217, in fit_generator
    class_weight=class_weight)
  File "/home/m/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1217, in train_on_batch
    outputs = self.train_function(ins)
  File "/home/m/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2715, in __call__
    return self._call(inputs)
  File "/home/m/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
    fetched = self._callable_fn(*array_vals)
  File "/home/m/.local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1382, in __call__
    run_metadata_ptr)
  File "/home/m/.local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 519, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[12,1024,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[Node: resneXt50_mask/conv2d_2/convolution = Conv2D[T=DT_FLOAT, _class=["loc:@training/Adam/gradients/AddN_201"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](resneXt50_mask/activation_1/Relu, conv2d_2/kernel/read)]]

while this is a pretty big input compared to yours (512x512 compared to the 32x32 default at imagenet) it has no problem with the keras implementation of resnet. it looks like there is an extra 512 sized dimension to the vector which cause it to eat all memory

the code i use is:
    pretrain_model_mask = ResNeXt.resnext.ResNext(input_shape = (512,512,3),
        include_top=False,
        weights=None,
        pooling='avg')
    pretrain_model_mask.name='resneXt50_mask'
    x = pretrain_model_mask(inp_mask)
    out = Dense(n_out, activation='sigmoid')(x)
    model = Model(inputs=inp_mask, outputs=[out])

    return model

model = create_model(
    input_shape=(512,512,3),
    n_out=28)

model.compile(
    loss='binary_crossentropy',
    optimizer='adam',
    metrics=['acc', f1])

train_generator = train_datagen.create_train(
    train_dataset_info, batch_size, (512,512,3))
validation_generator = train_datagen.create_train(
    valid_dataset_info, batch_size, (512,512,3))
K.set_value(model.optimizer.lr, 0.0001)
# train model
history = model.fit_generator(
    train_generator,
    steps_per_epoch=len(train_df)//batch_size,
    validation_data=validation_generator,
    validation_steps=len(valid_df)//batch_size//10,
    epochs=epochs,
    verbose=1,
    callbacks=[checkpointer])

any help will be greatly appreciated!

Regards, Moshe

mosheliv commented 5 years ago

forgot to mention, tf v1.10, keras 2.2.4

titu1994 commented 5 years ago

You are running out of GPU memory. There's no direct fix, other than to : Select a smaller model Select a smaller batch size Select a smaller image input size Buy a GPU with larger amounts of memory

mosheliv commented 5 years ago

Thank you for your reply. The card is a Titan X with 12GB. It should have run very well on it. The batch size is now 2. I have run Densenet201 on it with larger batch sizes so I think there is a problem here with the implementation.

titu1994 commented 5 years ago

That is another issue. ResNeXt is not optimized on Keras and Tensorflow since there are no group convolution ops.

mosheliv commented 5 years ago

for what I understand this should only slow it down, not consume more memory on the gpu.

titu1994 commented 5 years ago

Due to the fact that group conv doesn't exist in TF, I had to create and then slice multiple convs each with their own batch norm layer. This in turn does cost more memory, albeit not on a scale that it exceeds GPU memory. There must be some other issue.

I suggest using another model.

mosheliv commented 5 years ago

Again, thank you for your patience. I have created a super simple dummy program. it seems that the maximum I can get to with any of your models is 128x128 (batch size 1). I still think there is something wrong here but i will not pursue it farther. here is the code for the mini program with 256x256 (that fails):

import numpy as np
import pandas as pd

import keras
from keras.preprocessing.image import ImageDataGenerator
import senets.se_resnext
from keras.models import Sequential, Model, load_model
from keras.layers import Activation, Dense, Multiply, Input
from keras.callbacks import ModelCheckpoint
from keras import metrics
from keras.optimizers import Adam  
from keras import backend as K

import warnings
warnings.filterwarnings("ignore")

class DataGenerator:
    def __init__(self):
        self.image_generator = ImageDataGenerator(rescale=1. / 255,
                                     vertical_flip=True,
                                     horizontal_flip=True,
                                     rotation_range=180,
                                     fill_mode='reflect')
    def create_train(self, dataset_info, batch_size, shape, augument=True):
        assert shape[2] == 3
        while True:
            random_indexes = np.random.choice(len(dataset_info), batch_size)
            batch_images1 = np.empty((batch_size, shape[0], shape[1], shape[2]))
            batch_labels = np.zeros((batch_size, 28))
            for i, idx in enumerate(random_indexes):
                image1= self.load_image(
                    dataset_info[idx]['path'], shape)
                batch_images1[i] = image1
                batch_labels[i][dataset_info[idx]['labels']] = 1
            yield batch_images1, batch_labels

    def load_image(self, path, shape):
        image1 = np.stack((
            np.ones((256,256)), 
            np.ones((256,256)), 
            np.ones((256,256)), 
            ), -1)
        return image1.astype(np.float)

train_datagen = DataGenerator()

train_dataset_info = []
for i in range(0, 1000):
    train_dataset_info.append({
        'path':str(i),
        'labels':np.array([5])})
train_dataset_info = np.array(train_dataset_info)

valid_dataset_info = []
for i in range(1000, 1200):
    valid_dataset_info.append({
        'path':str(i),
        'labels':np.array([6])})
valid_dataset_info = np.array(valid_dataset_info)
print(train_dataset_info.shape, valid_dataset_info.shape)

def create_model(input_shape, n_out):
    inp_mask = Input(shape=input_shape)
    pretrain_model_mask = senets.se_resnext.SEResNext( input_shape = (256,256,3),
        include_top=False, 
        weights=None,    
        pooling='max')
    pretrain_model_mask.name='seresnext50_mask'

    x = pretrain_model_mask(inp_mask)
    out = Dense(n_out, activation='sigmoid')(x)
    model = Model(inputs=inp_mask, outputs=[out])

    return model

keras.backend.clear_session()

model = create_model(
    input_shape=(256,256,3), 
    n_out=28)

model.compile(
    loss='binary_crossentropy', 
    optimizer='adam',
    metrics=['acc'])

model.summary()

epochs = 10 ;batch_size = 1
checkpointer = ModelCheckpoint(
    './SEResNext50_{epoch:02d}-{val_loss:.4f}.model', 
    verbose=2, 
    save_best_only=False)

# create train and valid datagens
train_generator = train_datagen.create_train(
    train_dataset_info, batch_size, (256,256,3))
validation_generator = train_datagen.create_train(
    valid_dataset_info, batch_size, (256,256,3))
K.set_value(model.optimizer.lr, 0.0001)
# train model
history = model.fit_generator(
    train_generator,
    steps_per_epoch=1000,
    validation_data=validation_generator,
    validation_steps=20,
    epochs=epochs, 
    verbose=1,
    callbacks=[checkpointer])
titu1994 commented 5 years ago

Hmm the script at first glance seems to be proper. It is most probably due to an inefficient implementation of the ResNext model.

Could I ask you to replace the ResNext model with the seResnet model and try if you can manage larger batch sizes ?

mosheliv commented 5 years ago

yes, it is much much better. i can do 512x512 with batch size of 12 easily. So, according to my limited knowledge, this is 16 12 of the resnext sizebatch (128 * 128, 1). This is curious as I don't think slightly unoptimized net can be that much worse. Anyways, up to you if you want to look into it. I will give the seresnet model a go and see how much better it performs compared to the regular resnet

Thank you again!

titu1994 commented 5 years ago

Interesting. I'll take a look into it, but I don't want to continue work on ResNeXt until there is official support for grouped Convolutions in the future, as hacly solutions aren't to my liking.

mosheliv commented 5 years ago

I can truly sympathize, but your github repository is actually really popular because it is practically the only one. Anyways, thank you very much for taking the time to answer my questions. Se-resnet seems to train very well even without the pre trained weights.