patrick-tssn / Streaming-Grounded-SAM-2

Grounded Tracking for Streaming Videos

Apache License 2.0

26 stars 2 forks source link

grounding-dino sam2 segment-anything streaming-video tracking

readme

Streaming Grounded SAM 2

Grounded SAM 2 for streaming video tracking using natural language queries.

Demo

Framework

This system is comprised of three components:

LLM: This module is responsible for parsing the input query or inferring the intended object.
GroundingDINO: This component handles object referencing.
SAM-2: This part specializes in object tracking.

Getting Started

Installation

Prepare environments

conda create -n sam2 python=3.10 -y
conda activate sam2
pip install -e .

Download SAM 2 checkpoints

cd checkpoints
./download_ckpts.sh

Download Grounding DINO checkpoints

cd gdino_checkpoints
./download_ckpts.sh

or huggingface version (recommend)

cd gdino_checkpoints
huggingface-cli download IDEA-Research/grounding-dino-tiny --local-dir grounding-dino-tiny

Download LLM

4.1 GPT4-o (recommend)

cd llm
touch .env

past your API_KEY or API_BASE (Azure only)

API_KEY="xxx"
API_BASE = "xxx"

4.2 Qwen2

cd llm_checkpoints
huggingface-cli download Qwen/Qwen2-7B-Instruct-AWQ --local-dir Qwen2-7B-Instruct-AWQ

install the corresponding packages

run demo

Step-1: Check available camera

python cam_detect.py

If a camera is detected, modify it in demo.py.

Step-2: run demo

currently available model: Qwen2-7B-Instruct-AWQ, gpt-4o-2024-05-13

python demo.py --model gpt-4o-2024-05-13

Acknowledge: