Open zutsusemi opened 1 year ago
I ran the sample on kits but it requires a lot of memory, way more than 3090's 24G. I wonder how much memory we need to run this code?
Hi, The code requires about 35G memory. We tested with 2 3090 or 1 A40.
So how do you "connect" the 2 3090? Or did you somehow "split" the model into 2 parts?
I split the model into 2 parts and assign each parts to different GPUs
So how do you "connect" the 2 3090? Or did you somehow "split" the model into 2 parts?
I tried to run the encoder only on a 3090. The size of the input is 160x160x160. I didn't interpolate the input into 512x512x512 but changed the patch_size to 5x5x5 (So the number of patches is just the same). But it turned out 'out of memory'. I'm wondering do you split the model in this way, like the encoder part and the others? or is there some other "split" methods?
I tried to run the encoder only on a 3090. The size of the input is 160x160x160. I didn't interpolate the input into 512x512x512 but changed the patch_size to 5x5x5 (So the number of patches is just the same). But it turned out 'out of memory'. I'm wondering do you split the model in this way, like the encoder part and the others? or is there some other "split" methods?
Actually, the split is a little troublesome. As the major memory cost comes from the image encoder, I have to split the encoder into multiple gpus. The encoder is composed of multiple blocks, we put some blocks into the first gpu and the rest into the second gpu. As you can see from the code (image_encoder.py, line 156), we use two for loops to handle blocks[:6] and blocks[6:12] separately, this is actually how I split the encoder.
I ran the sample on kits but it requires a lot of memory, way more than 3090's 24G. I wonder how much memory we need to run this code?