lliusha1155 / LSENet

Location and Seasonality Enhanced Network for Multi-Class Ocean Front Detection
6 stars 2 forks source link

Training doesn't run #5

Open nsssayom opened 1 year ago

nsssayom commented 1 year ago

I get the following error when try to run training:

2022-12-30 17:56:38.489463: I tensorflow/tsl/framework/bfc_allocator.cc:1110] Stats: 
Limit:                      2745761792
InUse:                      2684493312
MaxInUse:                   2724110336
NumAllocs:                        1130
MaxAllocSize:               1255140864
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2022-12-30 17:56:38.489593: W tensorflow/tsl/framework/bfc_allocator.cc:492] ************************************************************************************************xxxx
2022-12-30 17:56:38.489637: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at tile_ops.cc:199 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[4,176,176,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/nsssayom/Dev/deepLearn/LSENet/train.py", line 220, in <module>
    model.fit_generator(generate_train_arrays_from_file(train[:num_train], batch_size),
  File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/keras/engine/training.py", line 2604, in fit_generator
    return self.fit(
  File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/nsssayom/Dev/deepLearn/LSENet/env/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

OOM when allocating tensor with shape[4,176,176,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
         [[{{node model/concatenate_9/concat-0-2-TransposeNCHWToNHWC-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
 [Op:__inference_train_function_9846]
2022-12-30 17:56:38.590754: W tensorflow/core/kernels/data/generator_dataset_op.cc:108] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
         [[{{node PyFunc}}]]
rubiyet commented 1 year ago

Facing the same issue!