nlitz88 / workzone

Workzone Boundary Detection
MIT License
1 stars 0 forks source link

Review available construction datasets #2

Closed nlitz88 closed 9 months ago

nlitz88 commented 9 months ago

Find one construction dataset that we can start with. Traffic cones, construction lights, barriers, construction vehicles (like excavators, work lights, generators, bulldozers, pavers, etc.--anything like that). Anything you might find in a construction zone. We can definitely combine multiple datasets if you can't find a single one containing all relevant construction objects.

Update: Maybe to be more specific, I think there are two different kinds of datasets that we might want to look for:

  1. Datasets that just contain images of of construction objects or objects you'd find in a work zone on the road (mostly what I was talking about above).
  2. General driving datasets (like KITTI) that are just pictures from a car driving around, but ones that include construction zones. We could even use dashcam videos from YouTube that capture a car driving through/past some sort of construction zone. While we will also want to set up a simulation environment in Carla to test our pipeline, running it on driving datasets or dashcam footage is probably a better, more real test of our pipeline.

Also, in looking for these datasets, it's good to check out all different kinds. We can use ones that look kinda scrappy/thrown together from roboflow--but we may have better luck with some of the better known, somewhat "vetted" datasets that are cited/used in other people's research.

CMUBOB97 commented 9 months ago

Update: From this IEEE paper written by Prof Raj, it looks like one of his students has already manually annotated work zones in nuScenes. I have sent out an email asking Prof Raj if this dataset is still available. The student name is Weijing Shi. nuScenes is a monocular image dataset with extra sensor fusion (lidar, etc.).

nlitz88 commented 9 months ago

Okay, I did a tiny bit of digging around with BEVFusion and nuScenes, here are some key points:

Also, while I haven't looked as closely at these other self-driving datasets yet, they could be useful (depending on how many instances of these kinds of objects there are). However, for the sake of time, we have to keep in mind that whatever dataset we want to use, whatever model we choose is going to need that dataset adapted in some way--so there's added complexity there.

nlitz88 commented 9 months ago

Also, from the above comment, I think a next step is to get a feel for the kind of "traffic cones" (and other construction-related objects) that are present in the nuScenes dataset. Maybe they group a bunch of different kinds of traffic barriers/markers under the class, for all we know. It'd be nice to get a feel for what these construction zone examples look like. If we find that only traffic cones and the above classes are feasible, then worst case scenario, maybe we could limit our scope to only worrying about those objects (rather than all construction objects).

I don't want to push it too hard as I'm not sure it was designed for outdoor/road use, but something like nvBlox on stereo-depth images could actually give a decent result--simply because that would allow me to create a massive 2D object detection dataset that it would use for segmentation under the hood with UNET (meaning we could include more types of construction obstacles, in theory.

nlitz88 commented 9 months ago

Okay, after a little bit of digging around in nuScenes just on their explorer utility, I have already found a number of scenes that have all kinds of traffic cones and construction barriers. Here is scene 61 from nuScenes v1.0 as an example:

Image

While I still can't say for certain whether nuScenes has labels for other types of traffic barriers, I would argue that, because the core objective of our project is workzone boundary detection, the emphasis should be on identifying objects that define a workzone--rather than on the actual perception task of detecting objects or specifically construction objects.

In my opinion (the position I'm taking), our project's core focus is not on construction object perception--but instead on the identification of construction zone boundary generation GIVEN perceived objects. The ability to detect, localize, and classify objects is at its core a perception task. Our main contribution is downstream of that--given perceived objects, we are extracting/inferring higher level information from those perceived objects!

Therefore, if the pretrained BEVFusion model is trained on nuScenes and is limited in detecting only some "construction zone barries" (for example)--oh well! If you want our downstream task of inferring a workzone boundary to be more robust, then you'd tell your perception engineers to go collect and annotate more data, expand your 3D object detection dataset, and retrain BEVFusion. Our position for this project is that we are not responsible for the enhancing perception--we're just using the baseline, SOTA approach to prove the efficacy of our downstream identification method.

Lol, sorry if that sounded politically charged--that's just our argument if questioned again :)