I get the following error when try to run training:
2022-12-30 17:56:38.489463: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats:
Limit: 2745761792
InUse: 2684493312
MaxInUse: 2724110336
NumAllocs: 1130
MaxAllocSize: 1255140864
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-12-30 17:56:38.489593: W tensorflow/tsl/framework/bfc_allocator.cc:492] ************************************************************************************************xxxx
2022-12-30 17:56:38.489637: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at tile_ops.cc:199 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[4,176,176,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/nsssayom/Dev/deepLearn/LSENet/train.py", line 220, in <module>
model.fit_generator(generate_train_arrays_from_file(train[:num_train], batch_size),
File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/keras/engine/training.py", line 2604, in fit_generator
return self.fit(
File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:
OOM when allocating tensor with shape[4,176,176,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model/concatenate_9/concat-0-2-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_train_function_9846]
2022-12-30 17:56:38.590754: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
I get the following error when try to run training: