Closed nlitz88 closed 9 months ago
Update: From this IEEE paper written by Prof Raj, it looks like one of his students has already manually annotated work zones in nuScenes. I have sent out an email asking Prof Raj if this dataset is still available. The student name is Weijing Shi. nuScenes is a monocular image dataset with extra sensor fusion (lidar, etc.).
Okay, I did a tiny bit of digging around with BEVFusion and nuScenes, here are some key points:
If we're going to use something like BEVFusion (or any 3D object detection network) to try to "easily" obtain a 2D costmap from images, then (maybe this is obvious) we're going to need a 3D detection dataset.
So, because I was a bit discouraged by the above, I started looking a bit more closely at nuScenes. It turns out the nuScenes dataset does have classes for *traffic cones, construction vehicles, construction workers, and temporary barriers.*
Also, while I haven't looked as closely at these other self-driving datasets yet, they could be useful (depending on how many instances of these kinds of objects there are). However, for the sake of time, we have to keep in mind that whatever dataset we want to use, whatever model we choose is going to need that dataset adapted in some way--so there's added complexity there.
Also, from the above comment, I think a next step is to get a feel for the kind of "traffic cones" (and other construction-related objects) that are present in the nuScenes dataset. Maybe they group a bunch of different kinds of traffic barriers/markers under the class, for all we know. It'd be nice to get a feel for what these construction zone examples look like. If we find that only traffic cones and the above classes are feasible, then worst case scenario, maybe we could limit our scope to only worrying about those objects (rather than all construction objects).
I don't want to push it too hard as I'm not sure it was designed for outdoor/road use, but something like nvBlox on stereo-depth images could actually give a decent result--simply because that would allow me to create a massive 2D object detection dataset that it would use for segmentation under the hood with UNET (meaning we could include more types of construction obstacles, in theory.
Okay, after a little bit of digging around in nuScenes just on their explorer utility, I have already found a number of scenes that have all kinds of traffic cones and construction barriers. Here is scene 61 from nuScenes v1.0 as an example:
While I still can't say for certain whether nuScenes has labels for other types of traffic barriers, I would argue that, because the core objective of our project is workzone boundary detection, the emphasis should be on identifying objects that define a workzone--rather than on the actual perception task of detecting objects or specifically construction objects.
In my opinion (the position I'm taking), our project's core focus is not on construction object perception--but instead on the identification of construction zone boundary generation GIVEN perceived objects. The ability to detect, localize, and classify objects is at its core a perception task. Our main contribution is downstream of that--given perceived objects, we are extracting/inferring higher level information from those perceived objects!
Therefore, if the pretrained BEVFusion model is trained on nuScenes and is limited in detecting only some "construction zone barries" (for example)--oh well! If you want our downstream task of inferring a workzone boundary to be more robust, then you'd tell your perception engineers to go collect and annotate more data, expand your 3D object detection dataset, and retrain BEVFusion. Our position for this project is that we are not responsible for the enhancing perception--we're just using the baseline, SOTA approach to prove the efficacy of our downstream identification method.
Lol, sorry if that sounded politically charged--that's just our argument if questioned again :)
Find one construction dataset that we can start with. Traffic cones, construction lights, barriers, construction vehicles (like excavators, work lights, generators, bulldozers, pavers, etc.--anything like that). Anything you might find in a construction zone. We can definitely combine multiple datasets if you can't find a single one containing all relevant construction objects.
Update: Maybe to be more specific, I think there are two different kinds of datasets that we might want to look for:
Also, in looking for these datasets, it's good to check out all different kinds. We can use ones that look kinda scrappy/thrown together from roboflow--but we may have better luck with some of the better known, somewhat "vetted" datasets that are cited/used in other people's research.