Closed spikedoanz closed 4 months ago
@spikedoanz I was hoping this to be a SynthSeg issue 😁. Unfortunately, @sergeyplis mentioned that this was not an issue with the original synthseg but only with the one within nobrainer. Let's see what we can do about it.
Clearly, the error states that the problem is with random spatial deformation layer. But before we go that far, could you try rerunning your code with the following snippet at the top of your script? Also, do you have the memory utilization curves for the GPU?
physical_devices = tf.config.list_physical_devices("GPU")
try:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
pass
Okay, I just noticed..the memory is indeed doubling every 100 examples or so...when I first started the code it was at ~4k, jumped to 8k, and stayed there for a bit before going up to 16k..And I am sure this will keep going. I tried running the one in SynthSeg but have some mismatch with CUDA/cudNN not detecting the GPU. So, it will take a bit to find the right config again.
I guess I spoke too soon earlier. I generated close to 1200 samples with no growth in memory usage from the last snapshot (see above). So, is it fair to call it a no-issue now?
I'll test this again with the extra lines you added
okay it lasted longer this time (2044 samples), but it still self terminated at the end. Memory usage also jumped 5G -> 9G -> 12G -> 15G and so on throughout.
Is it possible for you to jot down the intervals (or sample idx) at which these growths happened?
overall, cpu memory slowly climbs from 4G -> 18G, though with some dips here and there
at 52 seconds (sample 36), memory doubles from 4->8G (the first number is time since script started)
52.03 CPU: 4633.57 MB. GPU: 4405.00 MB 1/1 [==============================] - 1s 795ms/step 53.65 CPU: 5976.65 MB. GPU: 8503.00 MB 1/1 [==============================] - 1s 510ms/step
at 700 seconds (sample 536), memory doubles again
698.85 CPU: 10029.86 MB. GPU: 8503.00 MB 1/1 [==============================] - 1s 503ms/step 700.13 CPU: 10125.91 MB. GPU: 8503.00 MB 1/1 [==============================] - 0s 458ms/step 701.39 CPU: 10269.03 MB. GPU: 8503.00 MB 1/1 [==============================] - 1s 568ms/step 702.76 CPU: 10636.59 MB. GPU: 16695.00 MB 1/1 [==============================] - 1s 502ms/step 704.06 CPU: 10796.60 MB. GPU: 16695.00 MB 1/1 [==============================] - 1s 500ms/step 705.41 CPU: 9996.18 MB. GPU: 16695.00 MB
at 1300 seconds, exactly sample 1000, the program self terminates, despite not being out of memory
1/1 [==============================] - 0s 494ms/step 1294.58 CPU: 17998.53 MB. GPU: 16695.00 MB 1/1 [==============================] - 0s 496ms/step 1295.84 CPU: 18143.18 MB. GPU: 16695.00 MB 1/1 [==============================] - 0s 494ms/step 1297.08 CPU: 18255.30 MB. GPU: 16695.00 MB
See full log for more details: nobrainer-synthseg.log
Thanks. Can you please point me to the snippet of code to get these numbers (from CPU and GPU), so I can run the same on my end as well?
from time import time
import os
import psutil
import GPUtil
import tensorflow as tf
from nobrainer.processing.brain_generator import BrainGenerator
physical_devices = tf.config.list_physical_devices("GPU")
try:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
except:
pass
def get_memory_usage():
# Get CPU memory usage
process = psutil.Process(os.getpid())
cpu_mem = process.memory_info().rss / 1024 / 1024 # in MB
# Get GPU memory usage
gpus = GPUtil.getGPUs()
gpu_mem = 0
if gpus:
gpu = gpus[0] # Assuming you're using the first GPU
gpu_mem = gpu.memoryUsed
return cpu_mem, gpu_mem
# Get and print memory usage
cpu_usage, gpu_usage = get_memory_usage()
print(f"CPU Memory Usage: {cpu_usage:.2f} MB")
print(f"GPU Memory Usage: {gpu_usage:.2f} MB")
brain_generator = BrainGenerator(
"example.nii.gz",
randomise_res=False,
)
start = time()
while True:
img, lab = brain_generator.generate_brain()
# Get and print memory usage
cpu_usage, gpu_usage = get_memory_usage()
print(f"{(time()-start):.2f} CPU: {cpu_usage:.2f} MB. GPU: {gpu_usage:.2f} MB")
Sorry to bother you again, could you please generate a similar log for standalone synthseg as well? I am having trouble detecting the GPUs on openmind with the original virtual environment. While I managed to do so on my machine at LCN, the generation is excruciatingly slow (stuck at generation). I reckon it may have to do with the updated drivers but I could be wrong. 🤷♂️
PS: I am afraid I won't be of much help if I cannot set things up to reproduce the issue on my end. 😐
Okay, two things-
Point 2 coupled with how I integrated synthseg (tentatively) points to TF (not nobrainer) as the source of memory leak. Only a successful run of point 1 (running the standalone synthseg on a GPU with old drivers) can prove or disprove this with certainty.
Once again, Are you 100% sure this leak wasn't observed with the older version of TF 2.2?
Thoughts from anyone reading this are appreciated. :)
I think I found the source of the leak. The keyword is "generator" :). The concept of "gc" takes a hit with generators for obvious reasons- "state preservation". The predict method in tf/keras creates a data generator with one data item each time it is called and doesn't release memory. So, changing here
[image, labels] = self.labels_to_image_model.predict(model_inputs)
yield image, labels
to
[image, labels] = self.labels_to_image_model(model_inputs)
yield image.numpy(), labels.numpy()
will do the job.
See
and
You will notice that the GPU memory jumped after one sample..but otherwise it is consistent from there on...and more importantly there is no increase in the CPU memory. It is pretty stable unlike what has been seen before.
For more info, please refer to https://stackoverflow.com/questions/64199384/tf-keras-model-predict-results-in-memory-leak
PS: My experimentation is not extensive but please test this rigorously and close this issue when you are happy with the results. If you concur with my observations, I will reach out to Benjamin or Eugenio about this leak. I wonder what implications it has for training.
Update: Updated the previous comment to add yield image.numpy(), labels.numpy()
as well.
Another update: I only tested the code with one input. I don't know where things will break if a bunch of label maps (self.batch_size > 1) are provided. Probably, your use case entails providing a list of label maps. Once you test that, we can reach out to B and E.
The one drawback of the fix is "lost time". The fix is a bit slower than the original piece. This is because we are converting the eager tensors to numpy every time.
PS: Ignore the 01 in the y-axis tick labels. Read it as time from 00 to 4 hours of running the code.
Okay, two things-
- I managed to detect a GPU with CUDA 10.1 and CUDNN-10.1(7.6.4) on opendmind. This, so I could generate samples in the original SynthSeg environment using TF 2.2.0 and as I mentioned earlier, the generation is stuck.
- I used the current nobrainer environment (TF 2.15 and the latest and greatest CUDA/CUDNN as provided by pip install tensorflow[and-gpu]) and observed a similar memory growth with the standalone synthseg.
Point 2 coupled with how I integrated synthseg (tentatively) points to TF (not nobrainer) as the source of memory leak. Only a successful run of point 1 (running the standalone synthseg on a GPU with old drivers) can prove or disprove this with certainty.
Once again, Are you 100% sure this leak wasn't observed with the older version of TF 2.2?
Thoughts from anyone reading this are appreciated. :)
I'll have to find a way to replicate this test with OG SynthSeg to give you a conclusive answer, since I'm having issues installing SynthSeg on a fresh environment on my end also.
But for what it's worth, during my benchmarks using SynthSeg for Wirehead, it successfully ran for > 24 hours regularly with no issues
Okay, two things-
- I managed to detect a GPU with CUDA 10.1 and CUDNN-10.1(7.6.4) on opendmind. This, so I could generate samples in the original SynthSeg environment using TF 2.2.0 and as I mentioned earlier, the generation is stuck.
- I used the current nobrainer environment (TF 2.15 and the latest and greatest CUDA/CUDNN as provided by pip install tensorflow[and-gpu]) and observed a similar memory growth with the standalone synthseg.
Point 2 coupled with how I integrated synthseg (tentatively) points to TF (not nobrainer) as the source of memory leak. Only a successful run of point 1 (running the standalone synthseg on a GPU with old drivers) can prove or disprove this with certainty. Once again, Are you 100% sure this leak wasn't observed with the older version of TF 2.2? Thoughts from anyone reading this are appreciated. :)
I'll have to find a way to replicate this test with OG SynthSeg to give you a conclusive answer, since I'm having issues installing SynthSeg on a fresh environment on my end also.
But for what it's worth, during my benchmarks using SynthSeg for Wirehead, it successfully ran for > 24 hours regularly with no issues
Indeed, I too am stuck with env/gpu issues trying to get this in TF 2.2 with the OG SynthSeg. So, unfortunately and respectfully, I will have to disregard your finding until I reproduce and see the issue myself. :)
Updated plots with another version- explicit garbage collection (gc.collect()
) after image, labels = next(brain_generator)
(here).
If there are concerns about the green line (cpu for gc) causing OOM, you may want to try another gc.collect
after the model.predict
call and see how that performs.
it works!!
here's 8 generators running in parallel on one A40!
using the upstreamed 'synthseg' branch of nobrainer + inserting gc.collect() into the generation loop
cpu memory does leak, but it peaks at 35 gigs before cleaning up after itself and going back to 25 gigs
gpu memory is stable
side note: what timezone is this work being done in 🤔? I see 7:00 pm in the snapshot. :)
side note: what timezone is this work being done in 🤔? I see 7:00 pm in the snapshot. :)
I don't know actually. this cluster is supposedly in the same time zone as me, but now i'm unsure lol
- 35G sounds like a lot but after how many hours/samples was that? And, how many times did you observe the gc collect cleaning up after itself?
- Any news on this issue in the OG Synthseg (TF 2.2)? That's the only thing I care more about now. 😁
Looks like everyone is happy with where things stand with the fix. I am closing this for now.
@spikedoanz Benjamin and Eugenio are going to incorporate explicit garbage collection in their code. So, can you please confirm if you added a second gc.collect() or if one would suffice in the generate_brain() function as initially suggested? In case you added the second one, did you time it so we know there is an obvious benefit?
@hvgazula I did some informal measurements of doing the second gc.collect() vs not. Time diff is measuring end to end measured time between samples
without: ~1.25 seconds / sample
Time diff: 1.2285664081573486
1/1 [==============================] - 1s 508ms/step
Time diff: 1.2833566665649414
1/1 [==============================] - 1s 501ms/step
Time diff: 1.2467491626739502
1/1 [==============================] - 1s 503ms/step
Time diff: 1.2522339820861816
1/1 [==============================] - 1s 501ms/step
Time diff: 1.2513880729675293
1/1 [==============================] - 0s 500ms/step
with: ~1.38 seconds / sample
1/1 [==============================] - 1s 508ms/step
Time diff: 1.3852481842041016
1/1 [==============================] - 0s 498ms/step
Time diff: 1.371424913406372
1/1 [==============================] - 1s 504ms/step
Time diff: 1.413076639175415
1/1 [==============================] - 0s 499ms/step
Time diff: 1.3753676414489746
1/1 [==============================] - 1s 504ms/step
Time diff: 1.402883529663086
1/1 [==============================] - 0s 497ms/ste
TLDR: about a 10% decrease in throughput, for a bit of extra stability
the first gc.collect() stopped synthseg from OOM overall, and the second gc.collect() decreases memory variance. Since I've been running 8-10 instances of synthseg on a single node recently, the extra stability is definitely helpful. (if i remove the second gc.collect(), about half of my synthseg instances sometimes dies randomly due to sudden memory spikes)
Okay, I will go ahead with your recommendation then- gc.collect
at two different places.
Description
When using the
BrainGenerator
from the nobrainer library, I'm encountering an Out of Memory (OOM) error. The error occurs during thegenerate_brain()
method call, running on a loop for over 30 minutesEnvironment
Script using BrainGenerator
Error