google / automl

Google Brain AutoML
Apache License 2.0
6.22k stars 1.45k forks source link

Training on custom dataset of EfficientDet-0 model crash : TypeError: 'NoneType' object is not callable #1209

Open Pentar0o opened 7 months ago

Pentar0o commented 7 months ago

Hi, there is the problem :

penta@dell-r740:~$ python3 automl/efficientdet/tf2/train.py --train_file_pattern=train_tf.tfrecord --val_file_pattern=val_tf.tfrecord --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0 -finetune --pretrained_ckpt=efficientdet-d0 --batch_size=24 --eval_samples=1024 --num_examples_per_epoch=33707 --num_epochs=50 --hparams=voc_config.yaml --strategy=gpus 2024-02-13 15:12:30.743921: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-02-13 15:12:30.792061: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-02-13 15:12:30.792093: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-02-13 15:12:30.793253: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-02-13 15:12:30.800038: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-02-13 15:12:34.323310: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:236] Using CUDA malloc Async allocator for GPU: 0 2024-02-13 15:12:34.325068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13764 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:5e:00.0, compute capability: 7.5 2024-02-13 15:12:34.325313: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:236] Using CUDA malloc Async allocator for GPU: 1 2024-02-13 15:12:34.327016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13764 MB memory: -> device: 1, name: Tesla T4, pci bus id: 0000:af:00.0, compute capability: 7.5 2024-02-13 15:12:34.327235: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:236] Using CUDA malloc Async allocator for GPU: 2 2024-02-13 15:12:34.329015: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 13764 MB memory: -> device: 2, name: Tesla T4, pci bus id: 0000:d8:00.0, compute capability: 7.5 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') I0213 15:12:34.331413 140653502825600 mirrored_strategy.py:423] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2') I0213 15:12:34.387622 140653502825600 train.py:198] All devices: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU')] I0213 15:12:34.390356 140653502825600 efficientnet_builder.py:215] global_params= GlobalParams(batch_norm_momentum=0.99, batch_norm_epsilon=0.001, dropout_rate=0.2, data_format='channels_last', num_classes=1000, width_coefficient=1.0, depth_coefficient=1.0, depth_divisor=8, min_depth=None, survival_prob=0.0, relu_fn=functools.partial(<function activation_fn at 0x7feb559ce320>, act_type='swish'), batch_norm=<class 'utils.SyncBatchNormalization'>, use_se=True, local_pooling=None, condconv_num_experts=None, clip_projection_output=False, blocks_args=['r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25', 'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25', 'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25', 'r1_k3_s11_e6_i192_o320_se0.25'], fix_head_stem=None, grad_checkpoint=False) Traceback (most recent call last): File "/home/penta/automl/efficientdet/tf2/train.py", line 325, in app.run(main) File "/home/penta/.local/lib/python3.10/site-packages/absl/app.py", line 308, in run _run_main(main, args) File "/home/penta/.local/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main sys.exit(main(argv)) File "/home/penta/automl/efficientdet/tf2/train.py", line 255, in main model = train_lib.EfficientDetNetTrain(config=config) File "/home/penta/automl/efficientdet/tf2/train_lib.py", line 474, in init super().init(*args, **kwargs) File "/home/penta/automl/efficientdet/tf2/efficientdet_keras.py", line 819, in init self.backbone = backbone_factory.get_model( File "/home/penta/automl/efficientdet/backbone/backbone_factory.py", line 80, in get_model return efficientnet_model.Model(blocks_args, global_params, model_name) File "/home/penta/automl/efficientdet/backbone/efficientnet_model.py", line 633, in init self._build() File "/home/penta/automl/efficientdet/backbone/efficientnet_model.py", line 644, in _build self._stem = Stem(self._global_params, self._blocks_args[0].input_filters) File "/home/penta/automl/efficientdet/backbone/efficientnet_model.py", line 520, in init self._bn = global_params.batch_norm( TypeError: 'NoneType' object is not callable