isyangshu / MambaMIL

[MICCAI 2024] Official Code for "MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology"
49 stars 1 forks source link

some problems(the number of block for the proposed method, the overfitting issues) from paper. #6

Closed poult-lab closed 2 months ago

poult-lab commented 2 months ago

hello authors, this is very responsive work based on the Mamba, I appreciate the contribution form the authors. But I have some questions regarding the paper, 1.in the paper, the author compare the Mamba and Vim, I am curious the hyperparameter( e.g. the number of block for the proposed method) the vim is 24 according to I know.

  1. The authors state in the paper, the TransMIL has series problem of overfitting, according to I know, the overfitting : the result of training is much larger than the result of testing. but I hadn't seen the comparison between the result of training and testing.

Please point out if I am wrong. At last, thank you a lot for your wonderful work.

wyhsleep commented 2 months ago

Thank you for your thoughtful comments and questions regarding our paper. For question 1, the method we compare in our paper is the ordinary Mamba block and Bi-Mamba block, the number of blocks is two. We recently conducted a related survey; if you would like to learn more about the Mamba block, Bi-Mamba block, and other variants, please refer to https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models and https://arxiv.org/abs/2404.18861. For question 2, we provide a figure to compare the metric values of validation sets in different training epochs. TransMIL displays clear signs of overfitting on the validation set, characterized by a significant increase in validation loss alongside decreases in both the ACC and the AUC metrics. In contrast, MambaMIL exhibits stable performance across the evaluation period, showcasing its strong ability to alleviate overfitting.

poult-lab commented 2 months ago

Hello author, thank you for your quick reply. For question 1, I mean the number of block is depth, in the paper of Bi-Mamba, the author use 24 as the number of block, it means 24 blocks are connected, In the paper of mamba, it seems the number is 64.

poult-lab commented 2 months ago

Thank you for your thoughtful comments and questions regarding our paper. For question 1, the method we compare in our paper is the ordinary Mamba block and Bi-Mamba block, the number of blocks is two. We recently conducted a related survey; if you would like to learn more about the Mamba block, Bi-Mamba block, and other variants, please refer to https://github.com/Ruixxxx/Awesome-Vision-Mamba-Models and https://arxiv.org/abs/2404.18861. For question 2, we provide a figure to compare the metric values of validation sets in different training epochs. TransMIL displays clear signs of overfitting on the validation set, characterized by a significant increase in validation loss alongside decreases in both the ACC and the AUC metrics. In contrast, MambaMIL exhibits stable performance across the evaluation period, showcasing its strong ability to alleviate overfitting.

This survey is very good.

poult-lab commented 2 months ago

截图 2024-05-01 20-09-05

You can find the content of "the number of blocks" here form the paper Bi-Mamba block.

isyangshu commented 2 months ago

Well, you can just regard the Mamba, Bi-Mamba and our proposed SRMamba as different blocks in our Tab.3. So we don't directly compare Mamba and Vim. We just compare the ability of Mamba block and Vim block in our task. For our survival prediction task, we choose two layers for our implements.

By the way, we use the same hyper-parameters (D E N) as Mamba.

poult-lab commented 2 months ago

okay, I think I understand. Thank you for your quick response again, I will close this issue