About Training GPU Memory

isyangshu / MambaMIL

[MICCAI 2024] Official Code for "MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology"

58 stars 5 forks source link

About Training GPU Memory #13

Closed YuqiZhang-Buaa closed 4 months ago

YuqiZhang-Buaa commented 4 months ago

Dear Author: Question 1: I have observed that more GPU memory is used when using your transmil code than the original code (https://github.com/szc19990412/TransMIL), why is this?

Question 2: In my opinion, on some datasets whose number of bags varies a lot, the GPU memory value should jump with the training. But in fact this is not the case, the GPU memory value is very stable, can you tell me why?

Looking forward to your reply.

isyangshu commented 4 months ago

For the first question, the TransMIL() in models/TransMIL.py is similar to the TransMIL() in https://github.com/szc19990412/TransMIL/blob/main/models/TransMIL.py.

For the second question, GPU memory will increase as longer sequences are processed and remain stable at maximum overload. This question involves engineering issues (GPU cache), and most of open-soured MIL methods follow this setting.