hustvl / ViTMatte

[Information Fusion (Vol.103, Mar. '24)] Boosting Image Matting with Pretrained Plain Vision Transformers
MIT License
371 stars 37 forks source link

use ViTMatte-B to inference Distinctions-646, A100(80G) out of memory #10

Open tenzinOvO opened 1 year ago

JingfengYao commented 1 year ago

We use the grid sample strategy to reduce the inference computation burden. Please take a look at the last section of our paper for detail. The code can be found here.

tenzinOvO commented 1 year ago

Thanks for your response. So if i want to reproduce the result on Distinctions-646 reported in paper, i need to replace the vit.py in ViTMatte with the vit.py in MatteAnything?

JingfengYao commented 1 year ago

Yes. Or you can replace only the forward function in the Block. BTW, when you reproduce the results on Distinctions-646, the results will be influenced by the different trimaps you use.

tenzinOvO commented 1 year ago

Thanks for your reminding. I was curious whether the pseudo trimap in MatteAnything(Table 1 2 3 4) was obtained through a real user study or a simulation of user interactions implemented by code based on ground truth.

JingfengYao commented 1 year ago

It's a real user study.

shiwanlin commented 1 year ago

The matting outcome is markably better than matteformer. The big model is also visibly better than the small model. However, the memory requirement for inference is also markably bigger. For using the forward function in vit.py from this repo, it's 16x more for the big model and 8x more for the small model. After change the forward function to what is in MatteAnything, the big model still demands more memory than matteformer, which can run its inference with the same (hi-res) image in the same machine with about 80 GiB of GPU, and ViTMatte can't start the job (see below error message). Any other suggestion for further reducing the memory footprint?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.78 GiB (GPU 0; 79.10 GiB total capacity; 68.61 GiB already allocated; 8.59 GiB free; 69.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

JingfengYao commented 1 year ago

ViTMatte's high memory requirement is mainly because of the attention mechanism in the ViT backbone. From my perspective, I may try memory efficient attention or flash attention to replace the original attention in ViT to further reduce the computation burden. (NOTE: Using different attention in inference may cause performance degradation since the inconsistency between training and inference.)

felix-ky commented 1 year ago

can you share the distinction-646 dataset? for some reasons, the author is not accessible now.

almorozovv commented 12 months ago

@felix-ky https://github.com/yuhaoliu7456/CVPR2020-HAttMatting