leggedrobotics / darknet_ros

YOLO ROS: Real-Time Object Detection for ROS
BSD 3-Clause "New" or "Revised" License
2.18k stars 1.18k forks source link

Same bounding boxes published multiple times #150

Open cosama opened 5 years ago

cosama commented 5 years ago

I have observed that relatively often the same BoundingBoxes message is published multiple times.

I input images at roughly 15Hz and get bounding boxes out at about 20Hz, so it should be easy to check this behavior just with rostopic hz /camera/rgb/image_raw and rostopic hz /darknet_ros/bounding_boxes.

I don't really understand why but I think it has to do with threading in YoloObjectDetector.cpp in void YoloObjectDetector::yolo(). void *YoloObjectDetector::publishInThread() seems fine.

The information in the bounding boxes are identical, except for the Header:

header: 
  seq: 57
  stamp: 
    secs: 1536697169
    nsecs: 397959597
  frame_id: "detection"
image_header: 
  seq: 116
  stamp: 
    secs: 1536697169
    nsecs: 131043656
  frame_id: "camera"
bounding_boxes: 
  - 
    Class: "person"
    probability: 0.763025939465
    xmin: 124
    ymin: 102
    xmax: 168
    ymax: 249
  - 
    Class: "person"
    probability: 0.517869532108
    xmin: 500
    ymin: 109
    xmax: 545
    ymax: 245
header: 
  seq: 58
  stamp: 
    secs: 1536697169
    nsecs: 438457349
  frame_id: "detection"
image_header: 
  seq: 116
  stamp: 
    secs: 1536697169
    nsecs: 131043656
  frame_id: "camera"
bounding_boxes: 
  - 
    Class: "person"
    probability: 0.763025939465
    xmin: 124
    ymin: 102
    xmax: 168
    ymax: 249
  - 
    Class: "person"
    probability: 0.517869532108
    xmin: 500
    ymin: 109
    xmax: 545
    ymax: 245
diogo-aos commented 5 years ago

I have the same issue. I have entire rosbags with the same BoundingBoxes message published 6 times for every single frame. The only difference is the header, but image_header, bounding_boxes is the same.

martinploom commented 5 years ago

I noticed the same behavior. Did you guys find where to "fix" it?

diogo-aos commented 5 years ago

No. I studied the code. Even if no image is being published, bounding box messages still get published. From what I could gather the node waits for images to start processing, but once it receives an image the detection thread is always working and using the freshest image in memory. And the publish thread is always publishing bounding box messages based on the freshest detection available. From what I could find, after the first image, there is no checking if the freshest image and detection have already been processed.

On Wed, 15 May 2019 at 10:16, martinploom notifications@github.com wrote:

I noticed the same behavior. Did you guys find where to "fix" it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/leggedrobotics/darknet_ros/issues/150?email_source=notifications&email_token=AAID4HRKB3XUPEAXCGSLVP3PVPIGXA5CNFSM4HD5D4N2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVOBR2Q#issuecomment-492574954, or mute the thread https://github.com/notifications/unsubscribe-auth/AAID4HUO7QC6TJHCMFO4BA3PVPIGXANCNFSM4HD5D4NQ .

elpidiovaldez commented 5 years ago

There are a few of posts concerning this behaviour. I have looked at the code but I do not understand it well. Nevertheless I have managed to implement a 'fix' that appears to work.

I have a 1080ti GPU and I found that identical bounding boxes were being published around 5 times for each video frame (camera running at about 10fps). GPU usage was about 87%. After implementing the fix bounding boxes are published at the frame rate from distinct frames. GPU usage dropped to 26% or less. GPU Memory usage unchanged.

I am sure that a much more elegant solution could be produced by somebody who actually understood the code. I have never published code on Github and am not sure what I need to do to release the fix for inspection by a wider audience. If anybody is interested I can send them the code.

cosama commented 5 years ago

@elpidiovaldez. Great to hear you managed to find a way of fixing this. Could you submit a pull request with your changes?

elpidiovaldez commented 5 years ago

@cosama the problem is I don't understand how to do a pull request. I have never used github for anything except downloads. I just read the docs again and it seems like I have to create a branch first. I could not see how to do that. Frankly it is all double dutch unless you have some experience of collaborative software. In time I will work it out, but I don't have much spare time at the moment.

cosama commented 5 years ago

@elpidiovaldez Thanks for the reply. You would have to create a fork, clone the repo that is now on your own github page, add the fix and then commit/push it back to your own fork.. Then you could create a pull request. I know that can be confusing at first.

If it is easier you can also just upload/describe the important lines you changed and we can see how to move it into this repo.

elpidiovaldez commented 5 years ago

@cosama right now it is much easier for me just to give changed lines. All changes are in YoloObjectDetector.cpp.

I use a #define to control whether my code is included, so it is easy to compile with and without modifications and to compare the GPU usage and results. Add the #define near the start of YoloObjectDetector.cpp

// Check for xServer
#include <X11/Xlib.h>

#define UNIQUE_FRAMES

Now insert a piece of code at the start of 3 functions. In all cases the code simply causes the thread to terminate without doing anything if the video frame has already been processed:

void *YoloObjectDetector::detectInThread()
{

#ifdef UNIQUE_FRAMES
  static int prevSeq;
  if (prevSeq==headerBuff_[(buffIndex_ + 2) % 3].seq) {
      return 0;
  }
  prevSeq = headerBuff_[(buffIndex_ + 2) % 3].seq;
#endif
void *YoloObjectDetector::displayInThread(void *ptr)
{

#ifdef UNIQUE_FRAMES
  static int prevSeq;
  if (prevSeq==headerBuff_[(buffIndex_ + 1)%3].seq) {
      return 0;
  }
  prevSeq = headerBuff_[(buffIndex_ + 1)%3].seq;
#endif
void *YoloObjectDetector::publishInThread()
{
#ifdef UNIQUE_FRAMES
  static int prevSeq;
  if (prevSeq==headerBuff_[(buffIndex_ + 1)%3].seq) {
      return 0;
  }
  prevSeq = headerBuff_[(buffIndex_ + 1)%3].seq;
#endif

That is enough to stop the same video frame being processed multiple times, but the frame rate estimate is now far too high. The following fixes that. In the function body of YoloObjectDetector::yolo() you need to insert the conditionally compiled code in the final while loop as follows (I am showing the whole loop to give context):

  while (!demoDone_) {
    buffIndex_ = (buffIndex_ + 1) % 3;
    fetch_thread = std::thread(&YoloObjectDetector::fetchInThread, this);
    detect_thread = std::thread(&YoloObjectDetector::detectInThread, this);
    if (!demoPrefix_) {
#ifdef UNIQUE_FRAMES
      static int prevSeq;
      if (prevSeq!=headerBuff_[buffIndex_].seq) {
          fps_ = 1./(what_time_is_it_now() - demoTime_);
          demoTime_ = what_time_is_it_now();
          prevSeq = headerBuff_[buffIndex_].seq;
      }
#else
      fps_ = 1./(what_time_is_it_now() - demoTime_);
      demoTime_ = what_time_is_it_now();
#endif
      if (viewImage_) {
        displayInThread(0);
      }
      publishInThread();
    } else {
      char name[256];
      sprintf(name, "%s_%08d", demoPrefix_, count);
      save_image(buff_[(buffIndex_ + 1) % 3], name);
    }
    fetch_thread.join();
    detect_thread.join();
    ++count;
    if (!isNodeRunning()) {
      demoDone_ = true;
    }
  }

I worked out this fix to enable my own project to succeed. It attacks the symptom which is the same video data being processed repeatedly. I must emphasize that I have not fully understood the way that threading is being used and that it is likely that the author of the original code (or someone that understands exactly what is going on) could do something far more elegant which attacks the root cause of the problem.

kjbilton commented 5 years ago

Hey,

I was able to figure it out. You can find it at this branch of my fork:

https://github.com/kjbilton/darknet_ros/tree/fix/pubrate

In the original code, in the while loop of the yolo method, detectInThread is called onto its own thread, and detect_thread.join isn't called until after publishInThread is called. This means that publishInThread can be called before new detections are made in detectInThread, which means that detections made in a previous iteration are being published, which results in the behavior seen above.

One way to fix this (as done in the branch I linked above) is to call fetch_thread.join and detect_thread.join before calling publishInThread. Doing this seems to fix things for me.

Note that the branch I linked above is a branch off of another fix I made, which I'm waiting to get pulled in, so you may also want to take a look at that.

elpidiovaldez commented 5 years ago

@kjbilton Hi, I like your insight into what is going on in the threading, and your code modification is much more elegant than the hack I came up with. Nonetheless I just did a performance comparison between the two implementations using my live camera feed (at about 10 fps) and found this:

Yours: GPU Usage=73%, Displayed Yolo FPS=37, Camera FPS=9.89, rostopic hz boundingboxes=37.5

Mine: GPU Usage=27%, Displayed Yolo FPS=10, Camera FPS=10.0, rostopic hz boundingboxes=9.5

I ran the system with your code and looked at the sequence numbers in the bounding boxes:

rostopic echo /darknet_ros/bounding_boxes | grep seq

I got:

  seq: 3053
  seq: 858
  seq: 3054
  seq: 858
  seq: 3055
  seq: 858
  seq: 3056
  seq: 858
  seq: 3057
  seq: 859
  seq: 3058
  seq: 859
  seq: 3059
  seq: 859
  seq: 3060
  seq: 859
  seq: 3061
  seq: 860
  seq: 3062
  seq: 860
  seq: 3063
  seq: 860
  seq: 3064
  seq: 860
  seq: 3065
  seq: 861
  seq: 3066
  seq: 861
  seq: 3067
  seq: 861
  seq: 3068
  seq: 861

The sequence numbers in the 3000s are from the message header and all are unique. The sequence numbers in the 800s are from the image_header and show 4 repetitions from the same frame.

My conclusion is that your fix is not preventing video frames from being processed multiple times. I may be missing something. Please could you try the rostopic line to see if you get the same result as me.

elpidiovaldez commented 5 years ago

@fiorano10 many thanks for doing the pull request. Did you try either of the suggested fixes ? How did they work out for you ?

kjbilton commented 5 years ago

Hey @elpidiovaldez,

Thanks for trying that out. While I think what I said above still holds, I was mistaken in thinking it addressed the issue. But, I've had some more insight into the problem which agrees with your solution. Here's what's going on:

As you point out, the image is being processed multiple times. This means that camImageCopy_ is being fetched multiple times and stored in multiple indices in buff_. This can happen when the loop in yolo() containing fetchInThread and detectInThread iterates faster than images are coming in. In my case, I have multiple cameras with an effective input rate of >100Hz, so my previous suggestion had me thinking I fixed it. :)

I think one elegant way around this is is to push newly-acquired images in the image callback to a queue data structure and dequeue them after an image has been processed. If the queue is empty, you can skip the relevant parts of code (similar to what you've done). I suggest a queue since it would require replacing only a single variable (camImageCopy_) without introducing additional flags to keep track of. But also note that if the input rate is much greater than the throughput of the detector (as in my current case), the queue can keep growing, so a size limit would be needed (e.g., max size of 1).

fiorano10 commented 5 years ago

@elpidiovaldez @kjbilton Did you guys notice high CPU usage as well? After applying the fix the FPS is down to the camera frame rate but my CPU usage for each node is > 100% while the GPU usage is ~50%. Any ideas on how to fix this?

elpidiovaldez commented 5 years ago

@kjbilton @fiorano Hi. I thought that @kjbilton might have not seen multiple frames due to having a gpu that is slow relative to the camera frame rate. I have the reverse case - a slow camera and storming gpu. He seems to have a real insight into the code now.

I like the proposed scheme except for one thing. It appears to ensure that all frames are processed by queuing them. I believe some real-time systems might prefer to skip old frames, and always use the most recent image. I am guessing that a small circular buffer is used for just this reason, so that old frames are overwritten.

I am really sorry but I cannot really contribute at the moment. In 5 hours I am off to the back of beyond for the Summer holidays, where I hardly even have internet access.

v-raja commented 5 years ago

I have a simple fix here. Please let me know if this fixes your issue @elpidiovaldez and a pull request can be made.

If you're still having problem with CPU usage, you could try and use my fork which uses the latest darknet version to see if that helps.

elpidiovaldez commented 4 years ago

@vivkr I finally got round to trying your code. Sorry to take so long - I was on holiday all summer without ANY internet connection.

I downloaded your fork and can confirm that it seems to be about as efficient in GPU usage as my fix (around 27% according to nvidia_smi). Frame rate is published correctly as 10 fps (which is the true rate that frames arrive from my camera).

Since your fix seems to be much simpler than mine (just 1 line as far as I can see) we should definitely go with that.

However I did experience one problem with your fork. The video displayed by darknet did not show the marked-up bounding boxes. This seems to be because an extra parameter, 'ms' has recently been added to the function show_image in the darknet code.

Shame-fight commented 4 years ago

I have a simple fix here. Please let me know if this fixes your issue @elpidiovaldez and a pull request can be made.

If you're still having problem with CPU usage, you could try and use my fork which uses the latest darknet version to see if that helps.

Thank you for your work for this. Your changes are very good, but darknet_ros was updated about two weeks ago, adding the counting function, and the Yolo code is different from yours, may not be completed by adding a line of code, or I did not understand clearly. Can you update your code to the latest?

SoraDevin commented 4 years ago

Any update on this issue? I've tried making the change to insert an if statement checking the header sequence, before publishing, on the latest version but to no effect.

AravindaDP commented 3 years ago

I think better approach would be to use thread signaling mechanism (Semaphore or something similar) rather than return early in loop as proposed in https://github.com/leggedrobotics/darknet_ros/pull/169 since this would only resolve wasted GPU usage and multiple processing of same image. But still would cause CPU thread to run without yielding causing High CPU usage

ghost commented 3 years ago

@vivkr solution applied to latest code. As pointed out, bounding boxes no longer appear on image visually, but I am seeing them in the boundboxes ros topic.

line 488 in YoloObjectDetector.cpp

  // Fix0: keep track of what std_msgs::Header id this is (consecutively increasing)
  int prevSeq_ = 0;
  while (!demoDone_) {
    buffIndex_ = (buffIndex_ + 1) % 3;
    fetch_thread = std::thread(&YoloObjectDetector::fetchInThread, this);

    // Fix1: check this isn't an image already seen
    if (prevSeq_ != headerBuff_[buffIndex_].seq)
    {
        // Fix2: only detect if this is an image we haven't see before
        detect_thread = std::thread(&YoloObjectDetector::detectInThread, this);

        if (!demoPrefix_) {
          fps_ = 1. / (what_time_is_it_now() - demoTime_);
          demoTime_ = what_time_is_it_now();
          if (viewImage_) {
            displayInThread(0);
          } else {
            generate_image(buff_[(buffIndex_ + 1) % 3], ipl_);
          }
          publishInThread();
        } else {
          char name[256];
          sprintf(name, "%s_%08d", demoPrefix_, count);
          save_image(buff_[(buffIndex_ + 1) % 3], name);
        }
        // Fix3: increment the new sequence number to avoid detecting more than once
        prevSeq_ = headerBuff_[buffIndex_].seq;
        fetch_thread.join();
        detect_thread.join();
        ++count;
    }
    else
    {
      // Fix4: no detection made, so let thread execution complete so that it can be destroyed safely
      fetch_thread.join();
    }
    if (!isNodeRunning()) {
      demoDone_ = true;
    }
  }

@mbjelonic do maintainers see a better way to avoid running detection more than once on a single image? By blocking repetitive detections on the same image, it helps reduce GPU utilization significantly.

Example: with the above fix, my NVidia quadro RTX 3000 went from running this at 100+ FPS to the frame rate of my camera (configured to just 1 FPS!). I now can use my GPU for other things!