object detection/tracking pipeline

rajveerb commented 1 year ago

The goal is to figure out:

The most commonly used latest, such as YOLO family models, models used for object detection/tracking.
The most commonly used preprocessing operations to train these models.
The datasets that will be used.

What is not our goal?

We are not trying to find the best object detection model out there.
We are not trying to create the best, in terms of accuracy or speed, object detection model out there.
We do not want the best dataset, but the most commonly used datasets for the task.

Lastly change the existing pipeline code to create a object detection/tracking training ML pipeline

rajveerb commented 1 year ago

@kmpark70

Let me know if there's anything that is not included based on the meeting discussion today and if there's anything that you don't understand.

Most importantly, try to answer the above questions below so that I can assign the right task to Harshith before our next meeting.

kmpark70 commented 1 year ago

Question:

The type of object detection or tracking being performed seems to influence aspects such as video preprocessing and the choice of the model used. For instance, when conducting anomaly detection or surveillance detection, the VGG-16 model was predominantly utilized, and the data was often extracted from YouTube-BB, ImageNet VID. However, if object detection is being applied to autonomous driving, the approach may differ. Yolo is dominant in this area and there are a lot of different dataset they used such as nuScenes, Kitti etc. So, my question is what kind of object detection or tracking we are looking for?

rajveerb commented 1 year ago

Let's focus on object detection in a autonomous driving setting because it is an important problem. Also, focusing on detection tasks using Yolo is great because of its popularity.

So, the model class is Yolo and for the type of object detection focus on autonomous driving and find datasets for that.

kexinrong commented 1 year ago

Waymo dataset: https://waymo.com/open/

kmpark70 commented 1 year ago

This is among the papers I've read so far that utilize video data for autonomous driving.

Title: Video Preprocessing using neural networks

Processing steps: 1) Data Preprocessing: Remove noise and unwanted artifacts such as spatial and temporal filtering, normalization, and resizing. 2) Feature Extraction: Features are extracted from the video frames using CNNs and RNNs.

These features are used to represent the video frames and capture their spatiotemporal dynamics. 3) Network architecture design: using the 3D CNNs, spatiotemporal CNNs, and recurrent neural networks

Data Used: 1) COCO datasets 2) Kinetics dataset - for video classification

Title: Preprocessing Methods of Lane Detection and Tracking for Autonomous Driving

Preprocessing steps: 1) Extract Images from the video 2) Remove the noise and other unwanted components of the image called image smoothing 3) Region of Interest(ROI) selection, transferring color image into greyscale image or a different color format 4) Inverse perspective mapping(IPM) - remapping each pixel toward a different position, birds-eve view 5) Segmentation - to prepare images for detection stage

color and edge are two main features which are considered for lane detection segmentation

Data Used: In this paper, it discusses the preprocessing steps in more detail for lane detection and tracking systems without explicitly specifying the data used for testing.

Some Papers I read and related with the sources:

Obstacle classification and detection for vision based navigation for autonomous driving
Joint Multiclass Object Detection and Semantic Segmentation for autonomous driving

rajveerb commented 1 year ago

@kmpark70

Link to the papers that you talked in detail?

Also, is above info enough for you to implement a pipeline end to end?

kmpark70 commented 1 year ago

Preprocessing Methods of Lane Detection and Tracking for Autonomous Driving : https://arxiv.org/pdf/2104.04755.pdf Video Preprocessing neural networks : http://nauchniyimpuls.ru/index.php/ni/article/view/8194/5210 Joint Multiclass Object Detection and Semantic Segmentation for autonomous driving : https://ieeexplore.ieee.org/abstract/document/10098794

I need to start video pre-processing for autonomous driving this week. I believe using datasets like COCO, KITTI, or ImageNet VID should be sufficient for testing. It would be a good idea to reference materials as needed while coding and acquiring the necessary knowledge or information on an as-needed basis. What do you think?

rajveerb commented 1 year ago

@kmpark70

For the meeting, you need to concretely talk about a pipeline today.

Can you just describe a pipeline in this issue below?

What is the task? What is the dataset? What are the preprocessing operations? What is the model being trained?

kmpark70 commented 1 year ago

I summarized a paper in Word. Can you open and read it? CS4699 10:18.docx

kmpark70 commented 1 year ago

https://arxiv.org/pdf/2209.13508.pdf : this is the link for accessing the paper

Next Task: Try to analyze the preprocessing step in detail and look other paper(still focus on Waymo Dataset).

kmpark70 commented 1 year ago

I summarized a paper in word. 11:1 Task for CS4699.docx

Paper Link: PointAugmenting- Cross-Modal Augmentation for 3D Object Detection.pdf

Github Link: https://github.com/VISION-SJTU/PointAugmenting

rajveerb / lotus

object detection/tracking pipeline #7