Open tenzinOvO opened 1 year ago
Yes. Or you can replace only the forward function in the Block. BTW, when you reproduce the results on Distinctions-646, the results will be influenced by the different trimaps you use.
Thanks for your reminding. I was curious whether the pseudo trimap in MatteAnything(Table 1 2 3 4) was obtained through a real user study or a simulation of user interactions implemented by code based on ground truth.
It's a real user study.
The matting outcome is markably better than matteformer. The big model is also visibly better than the small model. However, the memory requirement for inference is also markably bigger. For using the forward function in vit.py from this repo, it's 16x more for the big model and 8x more for the small model. After change the forward function to what is in MatteAnything, the big model still demands more memory than matteformer, which can run its inference with the same (hi-res) image in the same machine with about 80 GiB of GPU, and ViTMatte can't start the job (see below error message). Any other suggestion for further reducing the memory footprint?
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.78 GiB (GPU 0; 79.10 GiB total capacity; 68.61 GiB already allocated; 8.59 GiB free; 69.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ViTMatte's high memory requirement is mainly because of the attention mechanism in the ViT backbone. From my perspective, I may try memory efficient attention or flash attention to replace the original attention in ViT to further reduce the computation burden. (NOTE: Using different attention in inference may cause performance degradation since the inconsistency between training and inference.)
can you share the distinction-646 dataset? for some reasons, the author is not accessible now.
We use the grid sample strategy to reduce the inference computation burden. Please take a look at the last section of our paper for detail. The code can be found here.