Closed evolu8 closed 1 year ago
I met the same issue. To resolve it, you can use a smaller batch size or more than one GPU.
Thanks @frank-xwang . I'm still digging into this a bit. Some of my images are of higher information density. This means that while stored as jpg they take up more disk space, despite being the same dimensions, channels and bit-depth (about twice the on disc size).
I'm surprised that this set needs running at proportionally smaller batch sizes. I can't see any other reason. But like I say, still digging...
But sure enough with a smaller batch size and 'dim' things run without CUDA OOM.
I had the same issue with precompute_knns.py, solved by reducing the hardcoded batch size here https://github.com/mhamilton723/STEGO/blob/452ba7b65b441e1eee0a21a58b8c110b0bd72555/src/precompute_knns.py#L81 (256 -> smaller number)
Thank you @Shershebnev, @frank-xwang, and @evolu8 for your suggestions. Closing this out for now
Running training results in the following:
running precompute_knns then fails due potsdm not having been unzipped. I unzipped and ran again and it failed with OOM: