ifzhang / ByteTrack

[ECCV 2022] ByteTrack: Multi-Object Tracking by Associating Every Detection Box
MIT License
4.77k stars 902 forks source link

Clarification on fps #87

Open lpkoh opened 2 years ago

lpkoh commented 2 years ago

Thank you so much for this. This repo is amazing.

Can I clarify about the fps numbers being declared?

As I understand, when I run ./demo_track currently, the process is something like:

  1. Import yolox model from yolox package
  2. Read video palace.mp4 (1280 x 720) and get frame
  3. Reshape video frame into desired input size (1440 x 800, e.g. for x version of yolox)
  4. Run object detection using imported yolox model
  5. Pass object detection to tracker for tracking And the number of frames that can pass through steps 1 to 5 in a second is the fps displayed on the video output being declared. Is this correct, or is the fps for simply the tracking stage, for e.g.? Like, do we count resizing and object detection time in fps?

Also, in the base demo, the yolox model is not converted to tensorrt, neither is the tracker right? Does this mean we can increase fps relative to what is shown on the video output from demo by:

  1. Improving pre processing speed
  2. Using a tensorrt optimized yolo detection model
  3. Using a tensorrt optimized tracker model

Your clarification would be super useful.

ifzhang commented 2 years ago

We have released a TensorRT + C++ implementation of ByteTrack and the speed is much faster.

lpkoh commented 2 years ago

We have released a TensorRT + C++ implementation of ByteTrack and the speed is much faster.

Hi yes, thank you for that, I will give it a try.

I was looking more to find out what you mean when you say "fps" in the palace video output. Is "fps" mean the entire process from video loading to preprocessing to detection to tracking, or is it referring just to time taken for tracking?

vcozzolino commented 2 years ago

We have released a TensorRT + C++ implementation of ByteTrack and the speed is much faster.

Hi yes, thank you for that, I will give it a try.

I was looking more to find out what you mean when you say "fps" in the palace video output. Is "fps" mean the entire process from video loading to preprocessing to detection to tracking, or is it referring just to time taken for tracking?

I actually have the same question. It's unclear whether the FPS value is related to the execution of the whole pipeline or just a specific step of it (which would make no sense, in my opinion). I'm running ByteTracker as a part of a much bigger project and the code is heavily modified, but on my V100 I get something like 3-4 FPS max to do from step 2 to 5 (so measured starting from the frame acquisition to the generation of the final result).

LamnouarMohamed commented 2 years ago

if I understand your question the FPS just for part tracking not inclue detection (i.e not inclue yolox)