Open alxhoff opened 5 months ago
@SuryanarayanaY I was able to replicate the issue reported here. Thank you!
Being discussed in Keras. #19058
@SuryanarayanaY I posted a similar thing on Keras as I was not sure if it was a TF problem or a Keras problem
Any news @SuryanarayanaY? I just have a research paper waiting on this bugfix >.<
Issue type
Bug
Have you reproduced the bug with TensorFlow Nightly?
Yes
Source
source
TensorFlow version
2.15
Custom code
Yes
OS platform and distribution
Manjaro Linux
Mobile device
No response
Python version
3.11.5
Bazel version
No response
GCC/compiler version
No response
CUDA/cuDNN version
12.3.52
GPU model and memory
No response
Current behavior?
Hello,
I am working on a TinyML NAS framework, i.e. throughout the execution of my code, hundreds if not thousands of models are created and trained. I have come across a problem that has been starving my system of memory after a couple of days of execution. By using
tracemalloc
I have been able to see that the main contributor appears to be the symbolic tensor created when creating a Conv2D layer. Maybe I am missing something basic in terms of garbage collection in my code but over time the demo code will eventually consume all system memory.I have also tried
tf.keras.backend.clear_session()
andgc.collect()
but neither help.Any help would be appreciated.
Cheers
Standalone code to reproduce the issue
Relevant log output