ikedaosushi / tech-news

DailyTechNews
19 stars 4 forks source link

Kernel keeps dying when training #14682

Open Jaywhisker opened 5 years ago

Jaywhisker commented 5 years ago

Hello, I am new to programming and this is my first model training. I would like to run my model but Juypter keeps telling me that my the kernel has died and needs to be restarted. Pls help me thank you!

from keras.models import Sequential from keras.layers.core import Flatten, Dense, Dropout from keras.layers.convolutional import Conv2D, MaxPooling2D, ZeroPadding2D from keras.optimizers import SGD import cv2, numpy as np

model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(224,224,3))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1))) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1))) model.add(Conv2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1)))

model.add(ZeroPadding2D((1,1))) model.add(Conv2D(256, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten()) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(11, activation='softmax'))

model.summary() from keras import optimizers

model.compile(loss='categorical_crossentropy', optimizer=optimizers.Adam(lr=0.001), metrics=['accuracy']) history = model.fit_generator( train_generator, # train generator has 973 train images steps_per_epoch=train_generator.samples // bs + 1, epochs=100, validation_data=val_generator, # validation generator has 253 validation images validation_steps=val_generator.samples // bs + 1, callbacks=[bestValidationCheckpointer] )

I've tried lowering the epoch to 20 but the kernel still dies.

Jaywhisker commented 5 years ago

I have received an error when it actually works, does anyone know how to fix this?

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[861184,4096] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [[node training/Adam/zeros_26 (defined at /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:702) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.