Closed stevenveenma closed 3 years ago
Can you post a script to reproduce?
Ok, the complete script is quit long so I provide you with the model and setting up the GridsearchCV:
def gridmodel(learn_rate=0.01, momentum=0):
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape=input_shape, use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(32, (3, 3), use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), padding='same', use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), use_bias=False))
model.add(BatchNormalization())
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
#model = multi_gpu_model(model, gpus=2)
optimizer = SGD(lr=learn_rate, momentum=momentum)
model.compile(loss='binary_crossentropy', optimizer=optimizer ,metrics=['accuracy'])
return model
learn_rate = [0.00001, 0.0001, 0.001, 0.01, 0.1]
momentum = [0.0, 0.2, 0.4, 0.6, 0.8, 0.9]
param_grid = dict(learn_rate=learn_rate, momentum=momentum)
model = KerasClassifier(build_fn=gridmodel, epochs=10, batch_size=30, verbose=0)
grid = GridSearchCV(cv=2,estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(x,y)
I tested this both using one and two GPU's (see multi_gpu_model that has been commented out in the model), but both fail now. When I do this on on a smaller (onedimensional) grid it succeeds most of the time. But of course I want to use a larger grid to get a better understanding of the parameters.
Don't know if this adds something but I send you the output of nvidia-smi when its training:
Wed Nov 21 08:27:12 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A |
| 50% 83C P2 153W / 250W | 11742MiB / 12189MiB | 73% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 00000000:04:00.0 On | N/A |
| 23% 30C P8 11W / 250W | 11628MiB / 12181MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 4409 C /home/anaconda3/bin/python 11731MiB | | 1 1260 G /usr/lib/xorg/Xorg 184MiB | | 1 2996 G compiz 69MiB | | 1 4409 C /home/anaconda3/bin/python 11371MiB | +-----------------------------------------------------------------------------+
Can you try to add K.clear_session() before model=Sequential() ?
On Wed, 21 Nov 2018, 08:29 Steven Veenma, notifications@github.com wrote:
Don't know if this adds something but I send you the output of nvidia-smi when its training: Wed Nov 21 08:27:12 2018
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================| | 0 TITAN X (Pascal) Off | 00000000:03:00.0 Off | N/A | | 50% 83C P2 153W / 250W | 11742MiB / 12189MiB | 73% Default |
+-------------------------------+----------------------+----------------------+ | 1 TITAN X (Pascal) Off | 00000000:04:00.0 On | N/A | | 23% 30C P8 11W / 250W | 11628MiB / 12181MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage |
|=============================================================================| | 0 4409 C /home/philips/anaconda3/bin/python 11731MiB | | 1 1260 G /usr/lib/xorg/Xorg 184MiB | | 1 2996 G compiz 69MiB | | 1 4409 C /home/philips/anaconda3/bin/python 11371MiB |
+-----------------------------------------------------------------------------+
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/keras-team/keras/issues/11693#issuecomment-440564324, or mute the thread https://github.com/notifications/unsubscribe-auth/AMS2K2XGNrr-10HZ4Qllrjh7HHCIV4YJks5uxQD0gaJpZM4YrJFg .
Tried it, but this causes the kernel to die.
I just met the same issue. Anybody has any idea?
I am applying gridsearch using keras.wrappers.scikit_learn.KerasClassifier on a CNN. My operating system is Ubuntu 16.04, I use Tensorflow with Python Jupyter Notebook and I use GPU. In general this is doing well on small grids, but when I increase the grid I run into an error:
Delete the underlying status object from memory otherwise it stays alive as there is a reference to status from the traceback due to ResourceExhaustedError: OOM when allocating tensor.....
I found this topic https://stackoverflow.com/questions/42047497/keras-out-of-memory-when-doing-hyper-parameter-grid-search where the suggestion is done to clears.session() between the different models. But this is not possible using GridsearchCV. It is said Keras should take care of clear.session(). 'If you are facing problem please submit an issue at Keras github Indraforyou'.
Is clear.session() indeed implemented in keras.wrappers.scikit_learn.KerasClassifier and what options do I have to get this working?