google / automl

Google Brain AutoML
Apache License 2.0
6.19k stars 1.45k forks source link

Since EfficientDet requieres TensorFlow > 2.8 we can't train anymore with CUDA #1146

Open fitoule opened 2 years ago

fitoule commented 2 years ago

I have only one NVIDIA GPU, I was training with TensorFlow 2.5.2 because of the bug with GPU and multiprocessing.

It was working with TensorFlow until 2.5.2 but now efficientdet require TF > 2.8 so I am stuck. I have to find code before "determinism" I think

fsx950223 commented 2 years ago
  1. Migrate to tf2
  2. Set num_epochs=1 and num_examples_per_epoch=num_epochs * num_exampels
fitoule commented 2 years ago

You mean I need to use the code under efficientdet/tf2/train.py ? or migrate by myself efficientdet/main.py ?

thank you

exx8 commented 2 years ago

@fitoule you mentioned some memory leak. I am facing too a memory leak. Can you give more info?

mateusz-wozny commented 8 months ago

I faced with the same problem. I used traineval mode, tensorflow 2.10 (then 2.13), in both cases there was memory leak after first epoch. Training was fine, but during evaluation probably CocoCallback cause memory leak. I commented this line (https://github.com/google/automl/blob/master/efficientdet/tf2/train_lib.py#L220) and everything is fine.