What is the VRAM requirements ?!

Does anyone know the minimum amount of VRAM to run the example gradio demo.py ?

I have 12 GB of VRAM and getting this error after clicking on "Submit" button in the demo page:

  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/gradio/blocks.py", line 2018, in process_api
    result = await self.call_function(
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/gradio/blocks.py", line 1567, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/gradio/utils.py", line 846, in wrapper
    response = f(*args, **kwargs)
  File "/home/kubilay/Projects/OmniParser/gradio_demo.py", line 71, in process
    dino_labled_img, label_coordinates, parsed_content_list = get_som_labeled_img(image_save_path, yolo_model, BOX_TRESHOLD = box_threshold, output_coord_in_ratio=True, ocr_bbox=ocr_bbox,draw_bbox_config=draw_bbox_config, caption_model_processor=caption_model_processor, ocr_text=text,iou_threshold=iou_threshold)
  File "/home/kubilay/Projects/OmniParser/utils.py", line 322, in get_som_labeled_img
    parsed_content_icon = get_parsed_content_icon(filtered_boxes, ocr_bbox, image_source, caption_model_processor, prompt=prompt)
  File "/home/kubilay/Projects/OmniParser/utils.py", line 98, in get_parsed_content_icon
    generated_ids = model.generate(input_ids=inputs["input_ids"],pixel_values=inputs["pixel_values"],max_new_tokens=1024,num_beams=3, do_sample=False)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 2793, in generate
    image_features = self._encode_image(pixel_values)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 2603, in _encode_image
    x = self.vision_tower.forward_features_unpool(pixel_values)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 647, in forward_features_unpool
    x, input_size = block(x, input_size)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 206, in forward
    inputs = module(*inputs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 206, in forward
    inputs = module(*inputs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 493, in forward
    x, size = self.window_attn(x, size)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 222, in forward
    x, size = self.fn(self.norm(x), *args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 434, in forward
    x = F.pad(x, (0, 0, pad_l, pad_r, pad_t, pad_b))
  File "/home/kubilay/.pyenv/versions/3.10.12/lib/python3.10/site-packages/torch/nn/functional.py", line 4552, in pad
    return torch._C._nn.pad(input, pad, mode, value)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB. GPU 0 has a total capacity of 11.70 GiB of which 50.25 MiB is free. Including non-PyTorch memory, this process has 10.43 GiB memory in use. Of the allocated memory 10.05 GiB is allocated by PyTorch, and 137.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I even looked at the paper. It's not mentioned. Are we supposed to assume everyone has 4090s nowadays?

I've just realized that it's picking up my pyenv python version instead of the one in conda env. That may be related to that. Checking now.

Nope. That was not it. Currently getting:

Traceback (most recent call last):
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/queueing.py", line 624, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/blocks.py", line 2018, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/blocks.py", line 1567, in call_function
    prediction = await anyio.to_thread.run_sync(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/gradio/utils.py", line 846, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/Projects/OmniParser/gradio_demo.py", line 71, in process
    dino_labled_img, label_coordinates, parsed_content_list = get_som_labeled_img(image_save_path, yolo_model, BOX_TRESHOLD = box_threshold, output_coord_in_ratio=True, ocr_bbox=ocr_bbox,draw_bbox_config=draw_bbox_config, caption_model_processor=caption_model_processor, ocr_text=text,iou_threshold=iou_threshold)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/Projects/OmniParser/utils.py", line 322, in get_som_labeled_img
    parsed_content_icon = get_parsed_content_icon(filtered_boxes, ocr_bbox, image_source, caption_model_processor, prompt=prompt)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/Projects/OmniParser/utils.py", line 98, in get_parsed_content_icon
    generated_ids = model.generate(input_ids=inputs["input_ids"],pixel_values=inputs["pixel_values"],max_new_tokens=1024,num_beams=3, do_sample=False)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 2793, in generate
    image_features = self._encode_image(pixel_values)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 2603, in _encode_image
    x = self.vision_tower.forward_features_unpool(pixel_values)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 647, in forward_features_unpool
    x, input_size = block(x, input_size)
                    ^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 206, in forward
    inputs = module(*inputs)
             ^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 206, in forward
    inputs = module(*inputs)
             ^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 497, in forward
    x, size = self.ffn(x, size)
              ^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 222, in forward
    x, size = self.fn(self.norm(x), *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/.cache/huggingface/modules/transformers_modules/microsoft/Florence-2-base-ft/9803f52844ec1ae5df004e6089262e9a23e527fd/modeling_florence2.py", line 252, in forward
    return self.net(x), size
           ^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/container.py", line 250, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/kubilay/miniconda3/envs/omni/lib/python3.12/site-packages/torch/nn/modules/activation.py", line 734, in forward
    return F.gelu(input, approximate=self.approximate)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacity of 11.70 GiB of which 41.94 MiB is free. Including non-PyTorch memory, this process has 10.81 GiB memory in use. Of the allocated memory 10.39 GiB is allocated by PyTorch, and 180.95 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

microsoft / OmniParser

What is the VRAM requirements ?! #31