ultralytics / yolov5

YOLOv5 πŸš€ in PyTorch > ONNX > CoreML > TFLite
https://docs.ultralytics.com
GNU Affero General Public License v3.0
50.38k stars 16.26k forks source link

Count inside a ROI and a Detection box #12035

Closed VedadarshanR closed 1 year ago

VedadarshanR commented 1 year ago

Search before asking

Question

Hello, I have to achieve the following conditions: i)Have a ROI and all detection should be inside ROI ii)Increment count by 1 for class 1 found inside class 0 iii)Remaining detection ie those that are not included in the above condition or unique detection of class 0 or class 1 should be counted

I am trying to calculate person count in a cctv footage.My custom model detects atleast head or person for every human.But sometimes detects only head or person.I have wrote the below code for counting but not getting the expected output.I have been trying for 2 days class0-person class1-human

Thanks in advance for the help.

final_count = 0 list_0 = [] # List to store class 0 bounding box coordinates list_1 = [] # List to store class 1 bounding box coordinates

for *xyxy, cls in reversed(det):
    c = int(cls)  # integer class
    centroid_x = (xyxy[0] + xyxy[2]) / 2
    centroid_y = (xyxy[1] + xyxy[3]) / 2

    if c == 0 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        list_0.append(xyxy)
    elif c == 1 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        list_1.append(xyxy)

for xyxy_1 in list_0:
    centroid_x_1 = (xyxy_1[0] + xyxy_1[2]) / 2
    centroid_y_1 = (xyxy_1[1] + xyxy_1[3]) / 2

    for xyxy_0 in list_1:
        if xyxy_0[0] <= centroid_x_1 <= xyxy_0[2]and xyxy_0[1] <= centroid_y_1 <= xyxy_0[3]:
            final_count += 1
            list_0.remove(xyxy_0)
            list_1.remove(xyxy_1)
            break

# Increment count for object 0 detections inside ROI
final_count += len(list_0)

# Increment count for object 1 detections inside ROI
final_count += len(list_1)

Additional

No response

github-actions[bot] commented 1 year ago

πŸ‘‹ Hello @VedadarshanR, thank you for your interest in YOLOv5 πŸš€! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a πŸ› Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.8.0 with all requirements.txt installed including PyTorch>=1.8. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

YOLOv5 CI

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 πŸš€

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 πŸš€!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics
glenn-jocher commented 1 year ago

@VedadarshanR hi there,

To achieve your mentioned conditions in YOLOv5, you can modify your code as follows:

final_count = 0
list_0 = []  # List to store class 0 bounding box coordinates
list_1 = []  # List to store class 1 bounding box coordinates

for *xyxy, cls in reversed(det):
    c = int(cls)  # integer class
    centroid_x = (xyxy[0] + xyxy[2]) / 2
    centroid_y = (xyxy[1] + xyxy[3]) / 2

    if c == 0 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        # Check if any class 1 detection box contains class 0 centroid
        if any(xyxy_1[0] <= centroid_x <= xyxy_1[2] and xyxy_1[1] <= centroid_y <= xyxy_1[3] for xyxy_1 in list_1):
            final_count += 1
        else:
            list_0.append(xyxy)
    elif c == 1 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        list_1.append(xyxy)

# Increment count for any remaining object 0 detections inside ROI
final_count += len(list_0)

# Increment count for any remaining object 1 detections inside ROI
final_count += len(list_1)

This modified code first checks if any class 1 detection box contains the centroid of a class 0 detection. If it does, it increments the count by 1. Otherwise, it adds the class 0 detection to the list for counting later. Then, it increments the count for any remaining detections of class 0 and class 1 inside the ROI.

I hope this helps! If you have any further questions, please feel free to ask.

VedadarshanR commented 1 year ago

@glenn-jocher Hi, i don't know what error you code has but the below works fine. I am trying to find the person count for multiple CCTV cameras and proving the sum,is there any out of the box features present in YOLOv5 that i need to know about

for now i am using this works good but if i am able to define ROI for each camera separately it will be good

python detect.py --view-img --source video.streams

final_count = 0 list_0 = [] # List to store class 0 bounding box coordinates list_1 = [] # List to store class 1 bounding box coordinates remove_0 = []

for *xyxy, cls in reversed(det):
    c = int(cls)  # integer class
    centroid_x = (xyxy[0] + xyxy[2]) / 2
    centroid_y = (xyxy[1] + xyxy[3]) / 2

    if c == 0 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        list_0.append(xyxy)
    elif c == 1 and x_min_roi <= centroid_x <= x_max_roi and y_min_roi <= centroid_y <= y_max_roi:
        list_1.append(xyxy)

for xyxy_1 in list_1:
    centroid_x_1 = (xyxy_1[0] + xyxy_1[2]) / 2
    centroid_y_1 = (xyxy_1[1] + xyxy_1[3]) / 2

    for xyxy_0 in list_0:
        if xyxy_0[0] <= centroid_x_1 <= xyxy_0[2]+xyxy_0[0] and xyxy_0[1] <= centroid_y_1 <= xyxy_0[3]+xyxy_0[1]:
            final_count += 1
            remove_0.append(xyxy_0)
            break
b=len(list_0)-len(remove_0)
final_count += b

# Increment count for object 1 detections inside ROI
a=len(list_1)-len(remove_0)
final_count += a

and I would also like to know the location where detection form each camera is read and outputted,As i plane to provide a ID to each camera and Output the count for each camera ID and the sum.

glenn-jocher commented 1 year ago

@VedadarshanR hi,

Thank you for sharing your code. It looks like you are trying to count the number of people in multiple CCTV camera feeds using YOLOv5. It's great to see that your current implementation is working fine.

To define a region of interest (ROI) for each camera separately, you can modify the code by adding specific ROI coordinates (x_min_roi, x_max_roi, y_min_roi, y_max_roi) for each camera. This way, you can filter the detections based on their centroid coordinates falling within the defined ROI for each camera.

Regarding your question about the location where the detection from each camera is read and outputted, you can find that in the detect.py file. By default, the code reads the input source from the --source argument, which can be a video file, a directory of images, or a camera index. The detections are then processed, and you can modify or access them according to your requirements. To output the count for each camera ID and the sum, you can introduce variables or data structures to store and accumulate the counts for each camera.

I hope this helps! If you have any further questions, please let me know.

VedadarshanR commented 1 year ago

Hi , can you give me a example of the implementation. like where can we parse the camera index and how to access them. I would also need to access the person count from each camera using the index Not only that but inputting a specific ROI For each camera is also necessary
I am a beginner thanks for your help

glenn-jocher commented 1 year ago

@VedadarshanR hi,

To implement different camera feeds with YOLOv5, you can start by modifying the detect.py script. Here's an example of how you can parse the camera index and access the detections for each camera:

  1. Parse the camera index: You can use command-line arguments to provide the camera index while running the script. For example, you can use the argparse module to parse the camera index as follows:

    import argparse
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--camera", type=int, default=0, help="Camera index")
    args = parser.parse_args()
    
    camera_index = args.camera
  2. Access the detections for each camera: Once you have the camera index, you can adjust the source argument of the detect.py script accordingly. For example, if you're using OpenCV to capture the video feed, you can modify the source argument as follows:

    import cv2
    
    # Open video capture for specified camera index
    cap = cv2.VideoCapture(camera_index)

    You can then process the frames and run object detection on each frame using YOLOv5.

  3. Retrieve person count from each camera: To retrieve the person count for each camera, you can maintain a separate count variable for each camera and increment it whenever a person is detected in the corresponding camera feed. For example:

    person_count_camera_1 = 0
    person_count_camera_2 = 0
    
    # Within the loop for processing each frame
    if person_detected:
       # Increment the count based on the camera index
       if camera_index == 1:
           person_count_camera_1 += 1
       elif camera_index == 2:
           person_count_camera_2 += 1
  4. Specifying ROI for each camera: To specify a region of interest (ROI) for each camera, you can define the ROI coordinates (x_min_roi, x_max_roi, y_min_roi, y_max_roi) for each camera. You can then filter the detections based on whether their centroid coordinates fall within the defined ROI for each camera, similar to the code we discussed in the previous response.

I hope this helps you get started with implementing camera feeds and accessing person counts for each camera using YOLOv5. If you have

VedadarshanR commented 1 year ago

Hi, Are you saying we can add these snippets to the detect.py and parse these statement to access the count

detect.py --source"" --camera 1

but i want to run multiple streams simultaneously .will this method be viable defining ROI and Index in the .streams file and when parsing and reading the .streams file using the dataloaders script LoadStreams method. We can modify the cod there and make it return index and roi according to the thread can you guide me in this.

glenn-jocher commented 1 year ago

@VedadarshanR hi,

Yes, you can add the code snippets to the detect.py script and parse the camera index using the --camera argument when running the script. For example:

python detect.py --source "" --camera 1

To run multiple streams simultaneously, you can define the ROI and camera index in the .streams file. When parsing and reading the .streams file using the LoadStreams method in the dataloaders script, you can modify the code to return the index and ROI according to the thread.

To achieve this, you can modify the LoadStreams method in the dataloaders script to read the .streams file and extract the camera index and ROI for each stream. You can then return these values along with the stream to use them in the detect.py script.

I hope this clarifies your query. If you have any further questions, feel free to ask.

VedadarshanR commented 1 year ago

@glenn-jocher Hi,

Can you please provide me example for that.and is there any function for tracking.

Thanks in advance

glenn-jocher commented 1 year ago

@VedadarshanR hi,

Thank you for reaching out. To provide an example of parsing camera index and implementing ROI, you can modify the detect.py script as follows:

  1. Parse the camera index using command-line arguments: You can use the argparse module to parse the camera index as an argument. For example:

    import argparse
    
    parser = argparse.ArgumentParser()
    parser.add_argument("--camera", type=int, default=0, help="Camera index")
    args = parser.parse_args()
    
    camera_index = args.camera
  2. Implement ROI for each camera: To define an ROI for each camera, you can specify the ROI coordinates (x_min_roi, x_max_roi, y_min_roi, y_max_roi) based on your desired region. You can then filter the detections based on whether their centroid coordinates fall within the defined ROI for each camera.

Regarding tracking, YOLOv5 itself does not include built-in tracking functionality. However, you can integrate external object tracking libraries such as Deep SORT or SORT with YOLOv5 to track objects across frames.

I hope this helps! If you have any further questions, please don't hesitate to ask.

github-actions[bot] commented 1 year ago

πŸ‘‹ Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO πŸš€ and Vision AI ⭐