googlecolab / colabtools

Python libraries for Google Colaboratory
Apache License 2.0
2.21k stars 725 forks source link

Colab hasn't been freeing system memory when it's supposed to #3412

Open Daviljoe193 opened 1 year ago

Daviljoe193 commented 1 year ago

Describe the current behavior Ever since early January (Colab being upgraded from Ubuntu 18.04 to 20.04), nearly every ML type notebook has been exibiting a strange behavior of not clearing system ram once no longer in use, specifically for Stable Diffusion and Whisper, which gets further compounded when switching models, eventuallly leading to no ram being usable, and the cell closing with a ^c. Below (Camenduru's Stable Diffusion notebook for Analog Diffusion, since it's all I have a screenrecording of, both pre and post 20.04 rollout) is a notebook showing this exact issue. Take note of the RAM indicator in the top right, that's NOT supposed to remain filled.

https://user-images.githubusercontent.com/67191631/218920053-1a4acf89-9e97-4f11-918d-8e998e8917d0.mp4

Describe the expected behavior The notebook should run like normal, and the system ram should be freed once it's no longer in use. Take note of the RAM indicator in the top right, that's what's SUPPOSED to happen.

https://user-images.githubusercontent.com/67191631/218919753-612ada2f-b6d9-4fe7-8c86-76dcd4d91b89.mp4

What web browser you are using Seems browser agnostic, though I've tested with Firefox, Microsoft Edge, and KDE's Falkon.

Additional context This issue seems to happen in nearly any notebook that uses the GPU. For me, the two pain points are @aadnk's notebook for OpenAI Whisper, and any Stable Diffusion notebook, doesn't matter if it's using Automatic1111's frontend via @Camenduru's notebook, or @TheLastBen's notebook, or even InvokeAI (Sorry, it's Russian, and needs an NGROK token). Every one of these has the same issue that wasn't present pre-20.04 rollout, ram gets used up, but then doesn't get freed until the cell stops. This issue has NOT been fixed, despite issue #3363 being closed, as mentioned by @remybonnav at the end of that thread...

https://github.com/googlecolab/colabtools/issues/3363#issuecomment-1424795538

This is not fixed, When loading one model after the other the RAM stil l reach over 12GB and crash 8models are only 2 GB).. This never happened like 3 or 4 weeks ago, using https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast_stable_diffusion_AUTOMATIC1111.ipynb

And @Omenizer on this thread from TheLastBen's Github, relating to merging two or more models...

Still got RAM issues/crashes too, basically after every merge it will ^C after trying to load the new model :(

So far, the only workaround I've come up with requires rolling back a good few depends, in a pretty haphazard way. The below snippit can be run before everything else (Especially because of the dpkg with wildcard), and this makes things seem to work as they used to.

!wget http://launchpadlibrarian.net/367274644/libgoogle-perftools-dev_2.5-2.2ubuntu3_amd64.deb
!wget https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/google-perftools_2.5-2.2ubuntu3_all.deb
!wget https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libtcmalloc-minimal4_2.5-2.2ubuntu3_amd64.deb
!wget https://launchpad.net/ubuntu/+source/google-perftools/2.5-2.2ubuntu3/+build/14795286/+files/libgoogle-perftools4_2.5-2.2ubuntu3_amd64.deb
!apt install -qq libunwind8-dev
!dpkg -i *.deb
%env LD_PRELOAD=libtcmalloc.so
Daviljoe193 commented 1 year ago

As of commits 49f3a73 on @Camenduru's notebooks, and 65fe78e on @TheLastBen's notebooks, my workaround has been applied. This doesn't change that the workaround shouldn't be needed in the first place, but it means that those notebooks are no longer viable for recreating the issue. As of right now, @aadnk's OpenAI Whisper notebook still doesn't have the workaround, so it is still able to reproduce this issue.