rohanrao619 / Social_Distancing_with_AI

Monitor people violating Social Distancing or not wearing Face Masks in public through CCTV footage.
MIT License
61 stars 42 forks source link

ResourceExhaustedError: OOM when allocating tensor with shape[16,64,112,112] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc #7

Closed zeeshanfarooqi closed 3 years ago

zeeshanfarooqi commented 3 years ago

@rohanrao619 rohanrao619

I am facing below issue while training on data set.. i have tried different steps per epoch but error remains the same.

ResourceExhaustedError Traceback (most recent call last)

in () 3 mask_classifier.fit(x=Train_Data_Generator, 4 steps_per_epoch=(100), ----> 5 epochs=n_epochs) 6 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 58 ctx.ensure_initialized() 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, ---> 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: 62 if name is not None: ResourceExhaustedError: OOM when allocating tensor with shape[16,64,112,112] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [[node model/conv1_conv/Conv2D (defined at :5) ]] Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. [Op:__inference_train_function_12364] Function call stack: train_function
rohanrao619 commented 3 years ago

Yeah I know, Colab is not that resource-friendly nowadays. Try switching between GPU, TPU, and the normal CPU hardware accelerators. Otherwise, you have no other option but to lower the batch size to get it working for now. Of course, you can also run locally on a strong system.