middle class token
proposed by Vim. The log and checkpoint of local_vim_tiny_middle_cls_token
are uploaded. Thanks to @FanqingM's issue.LocalVim-S
.LocalVMamba-S
.LocalVim
.LocalVMamba
(py). Since we rewrite the code related to Mamba operations, we need to retrain the models, and the checkpoints and logs of rest models will be uploaded later. We are preparing the detection and segmentation code now.LocalVim
(py). The checkpoint and training log of LocalVim-T
are uploaded.### Architecture of LocalVim
Model | Dataset | Resolution | ACC@1 | #Params | FLOPs | ckpts/logs |
---|---|---|---|---|---|---|
Vim-Ti (mid_cls_token) | ImageNet-1K | 224x224 | 76.1 | 7M | 1.5G | - |
LocalVim-T (mid_cls_token) | ImageNet-1K | 224x224 | 77.8 | 8M | 1.5G | ckpt/log |
Vim-Ti | ImageNet-1K | 224x224 | 73.1 | 7M | 1.5G | - |
Vim-S | ImageNet-1K | 224x224 | 80.3 | 26M | 5.1G | - |
LocalVim-T | ImageNet-1K | 224x224 | 76.2 | 8M | 1.5G | ckpt/log |
LocalVim-S | ImageNet-1K | 224x224 | 81.1 | 28M | 4.8G | ckpt/log |
VMamba-T | ImageNet-1K | 224x224 | 82.2 | 22M | 5.6G | - |
VMamba-S | ImageNet-1K | 224x224 | 83.5 | 44M | 11.2G | - |
LocalVMamba-T | ImageNet-1K | 224x224 | 82.7 | 26M | 5.7G | retraining... |
LocalVMamba-S | ImageNet-1K | 224x224 | 83.7 | 50M | 11.4G | ckpt/log |
See detection folder.
git clone https://github.com/hunto/LocalMamba.git
We tested our code on torch==1.13.1
and torch==2.0.2
.
Install Mamba kernels:
cd causual-conv1d && pip install .
cd ..
cd mamba-1p1p1 && pip install .
Other dependencies:
timm==0.9.12
fvcore==0.1.5.post20221221
We use ImageNet-1K dataset for training and validation. It is recommended to put the dataset files into ./data
folder, then the directory structures should be like:
classification
├── lib
├── tools
├── configs
├── data
│ ├── imagenet
│ │ ├── meta
│ │ ├── train
│ │ ├── val
│ ├── cifar
│ │ ├── cifar-10-batches-py
│ │ ├── cifar-100-python
sh tools/dist_run.sh tools/test.py ${NUM_GPUS} configs/strategies/local_vmamba/config.yaml timm_local_vim_tiny --drop-path-rate 0.1 --experiment lightvit_tiny_test --resume ${ckpt_file_path}
sh tools/dist_train.sh 8 configs/strategies/local_mamba/config.yaml timm_local_vim_tiny -b 128 --drop-path-rate 0.1 --experiment local_vim_tiny
Other training options:
--amp
: enable torch Automatic Mixed Precision (AMP) training. It can speedup the training on large models. We open it on LocalVMamba models. --clip-grad-norm
: enable gradient clipping. --clip-grad-max-norm 1
: gradient clipping value. --model-ema
: enable model exponential moving average. It can improve the accuracy on large model.--model-ema-decay 0.9999
: decay rate of model EMA. local_vim_tiny_search
:sh tools/dist_train.sh 8 configs/strategies/local_mamba/config.yaml timm_local_vim_tiny_search -b 128 --drop-path-rate 0.1 --experiment local_vim_tiny --epochs 100
After training, run tools/vis_search_prob.py
to get the searched directions.
This project is released under the Apache 2.0 license.
This project is based on Mamba (paper, code), Vim (paper, code), VMamba (paper, code), thanks for the excellent works.
If our paper helps your research, please consider citing us:
@article{huang2024localmamba,
title={LocalMamba: Visual State Space Model with Windowed Selective Scan},
author={Huang, Tao and Pei, Xiaohuan and You, Shan and Wang, Fei and Qian, Chen and Xu, Chang},
journal={arXiv preprint arXiv:2403.09338},
year={2024}
}