Open fitoule opened 2 years ago
You mean I need to use the code under efficientdet/tf2/train.py ? or migrate by myself efficientdet/main.py ?
thank you
@fitoule you mentioned some memory leak. I am facing too a memory leak. Can you give more info?
I faced with the same problem. I used traineval mode, tensorflow 2.10 (then 2.13), in both cases there was memory leak after first epoch. Training was fine, but during evaluation probably CocoCallback cause memory leak. I commented this line (https://github.com/google/automl/blob/master/efficientdet/tf2/train_lib.py#L220) and everything is fine.
I have only one NVIDIA GPU, I was training with TensorFlow 2.5.2 because of the bug with GPU and multiprocessing.
TF2.8 and No Child Process => works but Memory Leak :(
TF2.8 and Child Process => CUDA error on the first epoch because GPU has been taken by the main process https://github.com/google/automl/issues/855
TF2.5.2 and Child Process => does not work anymore since fix determinism
It was working with TensorFlow until 2.5.2 but now efficientdet require TF > 2.8 so I am stuck. I have to find code before "determinism" I think