Closed ret-1 closed 12 months ago
At that time (early stage of development, very different models/architecture), we got OOM with standard supervision but not with point supervision. We get this from Mask2Former.
I re-try main_training with your config (change batch_size 16->8) using four 3090. But it seems that these two loss functions have nearly the same memory.
Point supervision is the first, XMem is the second.
If it's convenient, could you please try to replace the loss function with the original XMem one and see if it reduces the memory on your machine?
Thanks a lot!
It does save memory during the initial development (we were using bipartite matching with cost matrices). You might be right that it does not save memory with the latest model. I will have to check this and update the paper if necessary. I cannot do this right now since I am quite busy with other stuff but thank you for letting me know.
I briefly checked -- with the cache cleared, (batch size 1) the full version uses 264\~350 MB during loss computation and the point supervision version uses 41\~66 MB.
So with cache, the extra memory cost might be covered by other temporary memory costs in the network.
Thank you so much for your cooperation!
Therefore, based on the conclusions of your attempts, is there not much memory difference between the two loss functions under normal training conditions (with the cache)? I was also very hopeful that point supervision would be very effective in reducing the memory, so I saw that your paper said "using only one-third of the memory during training" and tried to replace it quickly.
I'm guessing that the reason point supervision can solve the OOM in your initial version is probably because of bipartite matching.
Thanks for your great work.
Since Cutie adopt point supervision for training to reduce memory requirements, I replaced my loss function with yours, but it didn't the video memory. I also replaced your loss function with the original XMem's and modified it (as shown below), and again found the memory to be almost identical. Do you have any idea what this is about?