ultralytics / yolov3

YOLOv3 in PyTorch > ONNX > CoreML > TFLite

https://docs.ultralytics.com

GNU Affero General Public License v3.0

10.16k stars 3.44k forks source link

Inquiry on decorator,file types, and use in detect.py script, as well as the outputs created from said script #2074

Closed PhilipAmadasun closed 1 year ago

PhilipAmadasun commented 1 year ago

Search before asking

[X] I have searched the YOLOv3 issues and discussions and found no similar questions.

Question

I built ultralytics/yolov3 from source. I have some questions about the code. I would be really grateful for any help.

[i] Where in the directory is the "smart_interface_mode" decorator algorithm located?

[ii]I assume the "smart_interface_mode" decorator provides the YoloV3.yaml config file to the run() function?

[iii] Is there a yolov3.pt file and if not, how do I make my own .pt files?

[iv] How do I change the live camera feed data I am using? I think it has something to do with " webcam = source.isnumeric() or source.endswith('.streams') or (is_url and not is_file)" in the run() function, I think if I understand how source is set up specifically, I can change what live camera feed I want to use.

[v] Where are the labels and bounding box coordinates stored/outputed in the code? I want to ave access to them. I assume they are somewhere in the "smart_interface_mode" decorator?

Additional

No response

github-actions[bot] commented 1 year ago

👋 Hello @PhilipAmadasun, thank you for your interest in YOLOv3 🚀! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide a minimum reproducible example to help us debug it.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset image examples and training logs, and verify you are following our Tips for Best Training Results.

Requirements

Python>=3.7.0 with all requirements.txt installed including PyTorch>=1.7. To get started:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Environments

YOLOv3 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv3 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv3 training, validation, inference, export and benchmarks on macOS, Windows, and Ubuntu every 24 hours and on every commit.

Introducing YOLOv8 🚀

We're excited to announce the launch of our latest state-of-the-art (SOTA) object detection model for 2023 - YOLOv8 🚀!

Designed to be fast, accurate, and easy to use, YOLOv8 is an ideal choice for a wide range of object detection, image segmentation and image classification tasks. With YOLOv8, you'll be able to quickly and accurately detect objects in real-time, streamline your workflows, and achieve new levels of accuracy in your projects.

Check out our YOLOv8 Docs for details and get started with:

pip install ultralytics

glenn-jocher commented 1 year ago

@PhilipAmadasun hi there!

Great questions!

[i] The "smart_interface_mode" decorator is located in the "detect.py" script, at the beginning of the "main()" function.

[ii] That's correct! The decorator provides the YAML config file to the "run()" function.

[iii] There isn't a yolov3.pt file included in the repository for license reasons, but you can train your own model using the provided scripts and your own dataset. Please take a look at the "train.py" script to see how to train your own model.

[iv] To change the live camera source, you can modify the "source" field in the "run()" function in the "detect.py" script. You can provide a path to your own video file or a different camera index.

[v] The labels and bounding box coordinates are stored in the "pred" variable in the "inference()" function in the "detect.py" script. You can modify this function to save the labels and coordinates to a file or print them out to the console.

I hope this helps! Let us know if you have any further questions or concerns.

PhilipAmadasun commented 1 year ago

@glenn-jocher Thanks for the help!

There's some follow up questions swimming in my head. I will try to construct the most pertinent.

[i] Since I built this from source:

Is it still possible to import it into other python scripts. For example from ultralytics import YOLO or do I have to pip install ultralytics for this to be possible?
If I must pip install ultralytics, can I still use YOLOV3 and YOLOV4 weights? I ask because it seems like going this recommended route requires that I can then only use YOLOV8. But I'm not sure.

[ii] I may not have to modify any source code after all, dependent on your replies to question [i]. This is because it seems the boxes class type contains the information I need (bounding box coordinates and labels). What is the lay out of the box class type? Please use this example here:

Let us say an image contains multiple detectable objects. Let's say it contains 3 objects. From this code snippet here (don't know if this actually works I just got an example code from somewhere):
```
result=self.model(cv_image, verbose=False)
boxes = result.boxes.boxes.cpu().numpy()
```
How would I get the bounding box coordinates and labels of each of the 3 objects detected? I think that boxes is in some kind of array format I'm just not sure what next to do from this point I suppose.

glenn-jocher commented 1 year ago

@PhilipAmadasun hello! I'm glad I could help with your previous questions.

To answer your follow-up questions:

[i]

If you have built YOLOv3 from source, you can certainly import it into your own Python scripts. However, you will need to ensure that the correct paths are set up so that your script can find the necessary YOLOv3 files and modules. This will depend on where you have built the YOLOv3 repository on your machine.
If you choose to install Ultralytics YOLOv3 via pip, you will still be able to use pre-trained YOLOv3 and YOLOv4 weights. You will not be restricted to YOLOv8.

[ii] The boxes class type returned by the model() function contains the bounding box coordinates and labels for each detected object. More specifically, boxes is a numpy array with the following shape:

[num_det, 6]

where num_det is the number of detected objects in the image, and the second dimension represents the coordinates of the bounding box and the associated class label for each detected object. This is how you could extract the bounding box coordinates and labels of the detected objects from your example code snippet:

# Assuming "boxes" is a numpy array with shape [3, 6] (3 objects detected)
for i in range(len(boxes)):
    print("Object {0}: Label {1}, Bounding Box: {2}".format(i+1, int(boxes[i][5]), boxes[i][0:4]))

This code will print the bounding box coordinates and corresponding class label for each detected object in the image.

I hope this clarifies any further questions you may have had. If there is anything else you need help with, please don't hesitate to ask.

PhilipAmadasun commented 1 year ago

@glenn-jocher Again Thank you! I'll try the stuff you mentioned and give feedback. On the pip install ultralytics. Is the command pip install ultralyticsyolov3? What is the actual pip install to get the ultralytics that uses YOLV3 and YOLOV4?

glenn-jocher commented 1 year ago

Hi @PhilipAmadasun!

I'm glad I could help. Regarding your question, the command to install the Ultralytics package that includes YOLOv3 and YOLOv4 is simply pip install ultralytics. There is no need to specify YOLOv3 or YOLOv4 in the command. Once installed, you can import YOLOv3 or YOLOv4 by specifying the relevant configuration file in your python script, like so:

from yolov3.cfg import *

from yolov4.cfg import *

Hope this helps clarify things! Please let me know if you have any further questions or run into any issues.

PhilipAmadasun commented 1 year ago

@glenn-jocher I get an error for from yolov3.cfg import * The ubuntu terminal claims there is no command called yolov3.

Here is a snippet of code:


#!/usr/bin/python3
import rospy
from rospy.numpy_msg import numpy_msg
import numpy as np
from ultralytics import YOLO
from cv_bridge import CvBridge
from sensor_msgs.msg import Image
#from yolov3.cfg import *

rospy.init_node("test_node")
class camera_detect:
    def __init__(self):
        self.bridge = CvBridge()
        self.model = YOLO('yolov3-tiny.pt')

    def callback1(self,msg):
        self.cv_image = self.bridge.imgmsg_to_cv2(msg, desired_encoding='passthrough') #this just converts image data to a tyoe that opencv can understand.
        #print(self.cv_image )
        result = self.model(self.cv_image, verbose=False)[0]
        boxes = result.boxes.data.cpu().numpy() 
       #print(np.size(boxes))-- This prints out 6, which is means there are 6 detected objects right?
        for i in range(np.size(boxes)):  
            print("Object {0}: Label {1}, Bounding Box: {2}".format(i + 1, int(boxes[i][5]), boxes[i][0:4]))

As you can see I commented out the yolov3,cfg The for loop spits out Object 1: Label 56, Bounding Box: [ 81.644 167.63 261.75 244.96] which means (afaik) it's only printing out one object's data. Why would that be the case of there are 6 objects? Also, the label simply gives a number out of the 80 classes. How do I obtain the actual name of the object (i,e, dog, fridge)?

Please pardon another question. In [ 81.644 167.63 261.75 244.96], which elements represent the box centers, adn which represent width and height?

glenn-jocher commented 1 year ago

@PhilipAmadasun hello! I'm sorry to hear that you are having issues with the from yolov3.cfg import * command in your Python script. The error is likely due to a missing or incorrectly set path to the Ultralytics repository on your machine.

Regarding your code snippet, np.size(boxes) returns 6 because you have specified verbose=False in the model() function, which means that only one inference result is returned even if there are multiple detected objects.

To obtain the actual names of the detected objects, you can use the result.names attribute, which contains a list of all the class names in the dataset that the model was trained on (in this case, the COCO dataset). You can obtain the class name of a detected object by using the class index of the object, which is provided in the boxes array. Here's an example on how to do this:

class_names = result.names
for i in range(len(boxes)):
    class_index = int(boxes[i][5])
    class_name = class_names[class_index]
    print("Object {0}: {1}, Bounding Box: {2}".format(i+1, class_name, boxes[i][0:4]))

To answer your question on the meaning of [ 81.644 167.63 261.75 244.96 ], the first two elements represent the (x, y) coordinates of the top-left corner of the bounding box, and the last two elements represent the width and height of the bounding box, respectively.

I hope this helps! If you have any further questions, please let me know.

PhilipAmadasun commented 1 year ago

@glenn-jocher

[i] On making verbose=True I still get the same problem,

[ii[ The for loop only prints out the first object out of six. I also have a problem of showing the annotated images after object detection. The two problems may be related though.

Here is the current snippet of code.

      if self.distance is None:
                self.cv_image = self.bridge.imgmsg_to_cv2(image, desired_encoding='passthrough')
                #print(self.cv_image )
                self.result = self.model(self.cv_image, verbose=True)[0]
                # Visualize the results on the frame
                annotated_frame = self.result.plot()

                # Display the annotated frame
                cv2.imshow("YOLOv5 Inference", annotated_frame)
                self.class_names = self.result.names
                self.boxes = self.result.boxes.data.cpu().numpy()
                print("size of results:", np.size(self.boxes))
                try:
                    for i in range(np.size(self.boxes)):
                        print("NUMBER: ",i)
                        class_index = int(self.boxes[i][5])
                        class_name = self.class_names[class_index]
                        print("CLASS_NAME: ",str(class_name) )
                        if  str(class_name) == "fire hydrant":
                            self.info = self.boxes[i][0:4]
                            #rospy.loginfo(time_stamp)
                        #print("Object {0}: {1}, Bounding Box: {2}".format(i + 1, class_name, self.boxes[i][0:4]))

                except IndexError:
                    pass
            print("callback1 releases lock")

Relevant output in terminal:

size of results: 6
NUMBER:  0
CLASS_NAME:  fire hydrant
NUMBER:  1

EDIT:

I know what I was doing wrong partially. For the objects. Only one of them was detected, and np.size gives the number of columns of the array information of that one detected class. It is still a mystery that only one object,the fire hydrant was detected. I had placed a traffic cone and a dumpster in the field of view of the camera. Perhaps it's because I'm using a camera in a simulated world. I'll try the real world camera and provide feedback.

I still don't know how to get the annotated images to show up though.

github-actions[bot] commented 1 year ago

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher commented 10 months ago

@PhilipAmadasun, it seems that the behavior you're observing - only a single object (a fire hydrant) being detected - may be due to a variety of reasons including model configurations, input data, and the environment.

As for displaying annotated images, it's likely that the cv2.imshow() function is not being setup properly. Ensure that you set up this function correctly to display the annotated frame on your environment.

If you're still experiencing challenges after trying a real-world camera, please feel free to share the outcome and any related code or terminal output. This will help in troubleshooting the issue more effectively.