Review available construction datasets

nlitz88 commented 9 months ago

Find one construction dataset that we can start with. Traffic cones, construction lights, barriers, construction vehicles (like excavators, work lights, generators, bulldozers, pavers, etc.--anything like that). Anything you might find in a construction zone. We can definitely combine multiple datasets if you can't find a single one containing all relevant construction objects.

Update: Maybe to be more specific, I think there are two different kinds of datasets that we might want to look for:

Datasets that just contain images of of construction objects or objects you'd find in a work zone on the road (mostly what I was talking about above).
General driving datasets (like KITTI) that are just pictures from a car driving around, but ones that include construction zones. We could even use dashcam videos from YouTube that capture a car driving through/past some sort of construction zone. While we will also want to set up a simulation environment in Carla to test our pipeline, running it on driving datasets or dashcam footage is probably a better, more real test of our pipeline.

Also, in looking for these datasets, it's good to check out all different kinds. We can use ones that look kinda scrappy/thrown together from roboflow--but we may have better luck with some of the better known, somewhat "vetted" datasets that are cited/used in other people's research.

CMUBOB97 commented 9 months ago

Update: From this IEEE paper written by Prof Raj, it looks like one of his students has already manually annotated work zones in nuScenes. I have sent out an email asking Prof Raj if this dataset is still available. The student name is Weijing Shi. nuScenes is a monocular image dataset with extra sensor fusion (lidar, etc.).

nlitz88 commented 9 months ago

Okay, I did a tiny bit of digging around with BEVFusion and nuScenes, here are some key points:

If we're going to use something like BEVFusion (or any 3D object detection network) to try to "easily" obtain a 2D costmap from images, then (maybe this is obvious) we're going to need a 3D detection dataset.
- Now, unlike 2D object detection, this means we can't just find a bunch of random pictures of "traffic pylon" and 2D boxes around them--but instead, these need to be actual, mostly accurate 3D bounding boxes.
- Something I don't understand well enough is whether or not all the images in a 3D object detection dataset need to be taken from the same viewpoint (in general). For BEVFusion, because we're getting images from all kinds of viewpoints around the car, I would say maybe it doesn't matter. But for other 3D object detectors, I feel like this is kind of important.
- Even if they could be from any angle/perspective--I don't know yet how you create those 3D bounding boxes without having some idea as to where that 3D object is with respect to the camera. I.e., unless you have some sort of LiDAR data to serve as a ground truth to go along with each image, then I don't think you can accurately create those 3D bounding boxes. The upshot of this is that it will be very difficult to create our own 3D object detection dataset/labelled images.
So, because I was a bit discouraged by the above, I started looking a bit more closely at nuScenes. It turns out the nuScenes dataset does have classes for *traffic cones, construction vehicles, construction workers, and temporary barriers.*
- Not only does it have these classes, but there are quite a few instances of each of these. I.e., there are something like 100,000 traffic cone instances in the dataset, and a comparable number for the others (Check out this chart)

Also, while I haven't looked as closely at these other self-driving datasets yet, they could be useful (depending on how many instances of these kinds of objects there are). However, for the sake of time, we have to keep in mind that whatever dataset we want to use, whatever model we choose is going to need that dataset adapted in some way--so there's added complexity there.

https://www.argoverse.org/ (rip)
https://bdd-data.berkeley.edu/
https://robotcar-dataset.robots.ox.ac.uk/
https://waymo.com/open/ (only looks like they have labels for person and vehicle, so not very useful)

nlitz88 commented 9 months ago

Also, from the above comment, I think a next step is to get a feel for the kind of "traffic cones" (and other construction-related objects) that are present in the nuScenes dataset. Maybe they group a bunch of different kinds of traffic barriers/markers under the class, for all we know. It'd be nice to get a feel for what these construction zone examples look like. If we find that only traffic cones and the above classes are feasible, then worst case scenario, maybe we could limit our scope to only worrying about those objects (rather than all construction objects).

I don't want to push it too hard as I'm not sure it was designed for outdoor/road use, but something like nvBlox on stereo-depth images could actually give a decent result--simply because that would allow me to create a massive 2D object detection dataset that it would use for segmentation under the hood with UNET (meaning we could include more types of construction obstacles, in theory.

nlitz88 commented 9 months ago

Okay, after a little bit of digging around in nuScenes just on their explorer utility, I have already found a number of scenes that have all kinds of traffic cones and construction barriers. Here is scene 61 from nuScenes v1.0 as an example:

While I still can't say for certain whether nuScenes has labels for other types of traffic barriers, I would argue that, because the core objective of our project is workzone boundary detection, the emphasis should be on identifying objects that define a workzone--rather than on the actual perception task of detecting objects or specifically construction objects.

In my opinion (the position I'm taking), our project's core focus is not on construction object perception--but instead on the identification of construction zone boundary generation GIVEN perceived objects. The ability to detect, localize, and classify objects is at its core a perception task. Our main contribution is downstream of that--given perceived objects, we are extracting/inferring higher level information from those perceived objects!

Therefore, if the pretrained BEVFusion model is trained on nuScenes and is limited in detecting only some "construction zone barries" (for example)--oh well! If you want our downstream task of inferring a workzone boundary to be more robust, then you'd tell your perception engineers to go collect and annotate more data, expand your 3D object detection dataset, and retrain BEVFusion. Our position for this project is that we are not responsible for the enhancing perception--we're just using the baseline, SOTA approach to prove the efficacy of our downstream identification method.

Lol, sorry if that sounded politically charged--that's just our argument if questioned again :)

nlitz88 / workzone

Review available construction datasets #2