pytorch-labs / segment-anything-fast

A batched offline inference oriented version of segment-anything
Apache License 2.0
1.21k stars 72 forks source link

Error:SamAutomaticMaskGenerator function has a large memory footprint #110

Open lstswb opened 11 months ago

lstswb commented 11 months ago

GPU:4090 24G System:Ubuntu for WSL2 Model:sam_vit_h Image_size:[1024,1024] Parameter settings: model=sam, points_per_side=128, points_per_batch = 64, pred_iou_thresh=0.86, stability_score_thresh=0.92, crop_n_layers=3, crop_n_points_downscale_factor=2, min_mask_region_area=100, process_batch_size=4 Issue: When I use SamAutomaticMaskGenerator,GPU memory usage up to 55GB. 微信图片_20231229155121 And there will be an error. [torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.63 GiB. GPU 0 has a total capacity of 23.99 GiB of which 0 bytes is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 34.86 GiB is allocated by PyTorch, and 5.63 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) ] 微信图片_20231229155250 However, when using the original SAM code, this problem does not exist, and the GPU memory will not exceed 24GB.

lyf6 commented 10 months ago

@lstswb have u solved this?

lstswb commented 10 months ago

@lstswb have u solved this?

Not yet

cpuhrsch commented 10 months ago

Does the code snippet from the example help?

In particular

https://github.com/pytorch-labs/segment-anything-fast/blob/387488bc4c7ab2ae311fb0632b34cab5cbfbab78/amg_example/amg_example.py#L36-L44

note that you can adjust process_batch_size for a smaller memory footprint and note the use of sam_model_fast_registry

lstswb commented 10 months ago

Does the code snippet from the example help?

In particular

https://github.com/pytorch-labs/segment-anything-fast/blob/387488bc4c7ab2ae311fb0632b34cab5cbfbab78/amg_example/amg_example.py#L36-L44

note that you can adjust process_batch_size for a smaller memory footprint and note the use of sam_model_fast_registry

I tried to adjust batch_size, and the GPU memory footprint was reduced, but it still far exceeded the original code.

cpuhrsch commented 10 months ago

Yes, the batch is larger, but should be faster. The original code uses batch size 1. You can try setting it to batch size 1.

lstswb commented 10 months ago

Yes, the batch is larger, but should be faster. The original code uses batch size 1. You can try setting it to batch size 1.

I tried to adjust batch_size=1, but still got GPU memory error. 6334625e7e7b7f0846c22cdafea2f82

cpuhrsch commented 10 months ago

Hm, I assume you're also using the GPU for the display manager? That will take up additional memory as well. Maybe the solution in https://github.com/pytorch-labs/segment-anything-fast/issues/97 will help.

Can you use your onboard GPU (if you have one) for the display manager and the GPU for the model only? Does it work with vit_b?

lstswb commented 10 months ago

Hm, I assume you're also using the GPU for the display manager? That will take up additional memory as well. Maybe the solution in #97 will help.

Can you use your onboard GPU (if you have one) for the display manager and the GPU for the model only? Does it work with vit_b?

Vit_b can be used normally. Display takes up only a small portion of GPU memory. Setting vit_h equally works fine with the original code.

cpuhrsch commented 10 months ago

Hm, can you try setting the environment variable SEGMENT_ANYTHING_FAST_USE_FLASH_4 to 0?

lstswb commented 10 months ago

Hm, can you try setting the environment variable SEGMENT_ANYTHING_FAST_USE_FLASH_4 to 0? SEGMENT_ANYTHING_FAST_USE_FLASH_4 has been set to 0. But the problem remains. 微信图片_20240119144641

cpuhrsch commented 10 months ago

Hm, I'm not sure to be honest. It seems to work on other 4090s, but I think they're on Linux and not Windows.

lstswb commented 10 months ago

Hm, I'm not sure to be honest. It seems to work on other 4090s, but I think they're on Linux and not Windows.

Well. I try to do this using a linux system instead of using ubuntu based on WSL2