siyuanliii / masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
https://matchinganything.github.io
Apache License 2.0
970 stars 62 forks source link

About Time Using #20

Open CloudsRL opened 3 months ago

CloudsRL commented 3 months ago

Hello, thanks for your masa, i wanna try to find sth. and put it in my SLAM (mainly for Dynamic Object Tracking), but i face some trouble

when i run demo1, it cost about 260s to process 4s video. is that right?

nvidia-smi show that python using 2.6GB mem, so i think CUDA work, but i still wanna know am i doing things right?

i mean, it's a little slow althought it works very well, but it's there anyway to make it work faster?

lkeab commented 3 months ago

How many frames are there in your video? is the program stuck in the visualization part for a long time?

CloudsRL commented 3 months ago

no, not my video, just DEMO1 minions_rush_out.mp4. i remember 4second about 91frame? and not stuck in visualization, that's good. what i want to know is, the cost time is right or not? i think it's a little slow, far away from real-time, but nvidia card was working (CUDA)

siyuanliii commented 3 months ago

In our test, we can finish the minions_rush_out.mp4 video within 30s on a 3090Ti GPU for the GroundingDINO-MASA with SwinB backbone.

siyuanliii commented 3 months ago

"i think it's a little slow, far away from real-time". The majority of the running time is the detection part. Running grounding-DINO with SwinB backbone can be very slow on lower end gpu... For real-time performance, one can use Yolo-series detectors.

CloudsRL commented 3 months ago

3ks, i will have a try. and other quention, i wanna use MOT in SLAM program, so i need a real-time MOT and the better has ReID part (because in slam, cam will move and moving object move, so the IOU-based do not work). could u help me to introduce some?