ksugar / qupath-extension-sam

QuPath extension for Segment Anything Model (SAM)
GNU General Public License v3.0
89 stars 13 forks source link

CUDA Out of Memory Error During Inference in samapi Environment #16

Open halqadasi opened 5 months ago

halqadasi commented 5 months ago

While running inference tasks in the samapi environment, I encountered a CUDA out of memory error, causing the application to fallback to CPU inference. This issue significantly impacts performance. I'm looking for advice on mitigating this error or any potential fixes.

Environment

Steps to Reproduce

  1. Restart the server to ensure no residual GPU memory usage.
  2. Activate the samapi environment: source activate samapi
  3. Run the command: uvicorn samapi.main:app --workers 2
  4. Error encountered after selecting the vim-h and starting the labeling process.

Expected Behavior

I expected the GPU to handle the inference tasks without running out of memory, allowing for faster processing times.

Actual Behavior

Received a warning/error indicating CUDA out of memory. The system defaulted to using the CPU for inference, significantly slowing down the process. The error message was:

/home/.../anaconda3/envs/samapi/lib/python3.10/site-packages/samapi/main.py:152: UserWarning: cuda device found but got the error CUDA out of memory. Tried to allocate 768.00 MiB (GPU 3; 10.75 GiB total capacity; 1.95 GiB already allocated; 244.25 MiB free; 2.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF - using CPU for inference

Additional Information

ksugar commented 5 months ago

Hi @halqadasi, thank you for reporting the issue. Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server? If you are using a high-resolution display, the image size may become larger than expected. If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.

import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"
halqadasi commented 5 months ago

Hi @halqadasi, thank you for reporting the issue. Could you run the following script in the Script Editor in QuPath to check the size of the image to be sent to the server? If you are using a high-resolution display, the image size may become larger than expected. If you encounter this problem, please try lowering the screen resolution to see if it fixes the issue.

import org.elephant.sam.Utils
import qupath.lib.awt.common.AwtTools

def viewer = getCurrentViewer()
def renderedServer = Utils.createRenderedServer(viewer)
def region = AwtTools.getImageRegion(viewer.getDisplayedRegionShape(), viewer.getZPosition(),
                viewer.getTPosition());
def viewerRegion = RegionRequest.createInstance(renderedServer.getPath(), viewer.getDownsampleFactor(),
                region);
viewerRegion = viewerRegion.intersect2D(0, 0, renderedServer.getWidth(), renderedServer.getHeight())
def img = renderedServer.readRegion(viewerRegion)
println "Image size processed on the server: (" + img.getWidth() + ", " + img.getHeight() + ")"

The size is 1133 * 731 and I got this error on the terminal:

    raise DecompressionBombError(msg)
PIL.Image.DecompressionBombError: Image size (256160025 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.
ksugar commented 5 months ago

@halqadasi the pixel size 178956970 looks larger than expected. I will investigate it further. In the mean time, could you check if the smaller models (vit_l, vit_b, vit_t) works without giving CUDA OOM error?

ksugar commented 5 months ago

@halqadasi, it seems that the OOM issue was caused by an older version of dependencies. I have updated the torch dependency to the latest version in samapi v0.4.1. Please try updating the samapi server and see if the issue is solved.