Closed tyiannak closed 3 years ago
@apoman38 @theopsall please comment your ideas
More detailed description after today's call with @lobracost
@lobracost let me know if i forget sth in this draft planning description
Introduced a new folder under the name object_detection. It will contain everything related to object detection. Created a class the objects of which will be our ssd models. Wrote methods for detection and ploting. Plots are compatible with OpenCV.
No progress today. I tried to implement an online object detection but had some problems. Most of them are solved, but it seems that the neural net can't handle some black frames that occur on shot changes(especially at the beginning or the end of the videos).
No progress today. I tried to implement an online object detection but had some problems. Most of them are solved, but it seems that the neural net can't handle some black frames that occur on shot changes(especially at the beginning or the end of the videos).
do u mean it crashes or that it does not find objects during shot transition? Because the latter would not be big of an issue...
No progress today. I tried to implement an online object detection but had some problems. Most of them are solved, but it seems that the neural net can't handle some black frames that occur on shot changes(especially at the beginning or the end of the videos).
do u mean it crashes or that it does not find objects during shot transition? Because the latter would not be big of an issue...
It crashes, but I'll try to find the reason and fix it.
Done today:
Online object detection added to video processing.
The problem was that I was using the nvidia's ssd model from the torch hub and not from the git repo. The torch hub's model wasn't updated and threw an error when nothing was detected. I solved it by modifying one of the model files(one if statement needed) at it's first download. Every time you download it for the first time, our code will modify this specific file.
For the time I don't save the outcome of the object detection to the feature vector. Do you want to just save the categories and the bboxes to the feature vector, or do you have something else in mind?
Done today:
Online object detection added to video processing.
The problem was that I was using the nvidia's ssd model from the torch hub and not from the git repo. The torch hub's model wasn't updated and threw an error when nothing was detected. I solved it by modifying one of the model files(one if statement needed) at it's first download. Every time you download it for the first time, our code will modify this specific file.
That's great
For the time I don't save the outcome of the object detection to the feature vector. Do you want to just save the categories and the bboxes to the feature vector, or do you have something else in mind?
Do you mean to the final feature vector we have in visual_analysis? I would say no, not directly the bboxes. Let's add (in this task or in a new one - whatever u prefer) a new function - say get_object_features_from_objects() that takes a list of detected objects (bboxes + labels) and returns a set of features. These features will be added in the final vector by calling that function. Let's say that for a very draft initial version we will add (a) the num of objects in a set of categories (b) their average normalized area. This "set of categories" can be hard-coded for the beggining such as vehicle or car or motorbike or bla bla. Then we can incrementally add more "groups" as we have defined in the next task but for the time lets add just these two dummy features
I made a function that returns 3 object features:
@tyiannak since our code can now extract information about 80 objects (including persons), should we keep or remove the haar cascade face detection?
I made a function that returns 3 object features:
- frequency of every label per frame
- the average confidence of every object detected
- the average area occupied by the labels per frame
@tyiannak since our code can now extract information about 80 objects (including persons), should we keep or remove the haar cascade face detection?
Does it have both persons and faces as separate types of objects? If it also has faces, then we should remove the haar-based face features. On the other hand, either if it is based on the new object detector or the haar face detector, we will need some separate "statistics" for the faces as final features in the future as faces are probably the most important factor of differentiation of types of shots.
I made a function that returns 3 object features:
- frequency of every label per frame
- the average confidence of every object detected
- the average area occupied by the labels per frame
@tyiannak since our code can now extract information about 80 objects (including persons), should we keep or remove the haar cascade face detection?
Does it have both persons and faces as separate types of objects? If it also has faces, then we should remove the haar-based face features. On the other hand, either if it is based on the new object detector or the haar face detector, we will need some separate "statistics" for the faces as final features in the future as faces are probably the most important factor of differentiation of types of shots.
Unfortunately it only detects persons, not faces. I agree that faces are an important factor. Maybe at sometime, we'll have to find another classifier since haar only recognises frontal faces, which is a problem.
I grouped some of the categories, based on the coco dataset documentation. The code can now extract and save features for these categories. The categories are:
'person': person 'vehicle': bicycle car motorcycle airplane bus train truck boat
'outdoor': traffic light fire hydrant stop sign parking meter bench
'animal': bird cat dog horse sheep cow elephant bear zebra giraffe
'accessory': backpack umbrella handbag tie suitcase
'sports': frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket
'kitchen': bottle wine glass cup fork knife spoon bowl
'food': banana apple sandwich orange broccoli carrot hot dog pizza donut cake
'furniture': chair couch potted plant bed dining table toilet
'electronic': tv laptop mouse remote keyboard cell phone
'appliance': microwave oven toaster sink refrigerator
'indoor': book clock vase scissors teddy bear hair drier toothbrush
Seems ok @lobracost Are the values of the dict above the complete list of objects detected initially?
Seems ok @lobracost Are the values of the dict above the complete list of objects detected initially?
Yes this is the complete list. I mapped 80 categories to 12. If, at any time, you need to take a look at the categories, just open the file category_names.txt, which is under the directory analyze_visual.
@lobracost will u send this task for PR or are there any more changes to be done here?
@lobracost will u send this task for PR or are there any more changes to be done here?
I still have to fix some things on the confidences smoothing. The PR will be ready probably tomorrow.
right i had forgotten about smoothing again :-)
Description: