nlitz88 / workzone

Workzone Boundary Detection
MIT License
1 stars 0 forks source link

Survey local costmap generation approaches #1

Closed nlitz88 closed 7 months ago

nlitz88 commented 9 months ago

I'll add more to this later, but the short of it is: There are many ways to take sensor inputs, detect objects, figure out where they are in 3D space, and plot those on a 2D occupancy grid (which lots of people call a "costmap." Specifically, a "local" costmap just means an occupancy grid around our vehicle with all of the objects positioned relative to it). I.e., this is a very open-ended task in AV, and there is definitely no single, de-facto approach.

However, in thinking about the scope of our project, creating this local costmap around the car isn't really the task that we should be stressing over. Rather, our project is more focused on "given a local costmap of all the objects detected around our car--how do we detect a construction zone and draw a boundary around it?" That is, we shouldn't spend all our energy figuring out how to construct the costmap, but instead focus our energy on that second part: identifying and drawing a boundary around construction zones.

Having said that, I'm thinking it would be good for us to go out and do a small "literature review" on some of the "out of the box" approaches to obtaining a local costmap around our vehicle (which will be situated in Carla, if I'm remembering correctly what he told us in class). This could mean going out and looking for research papers, open source projects, YouTube tutorials, etc.

nlitz88 commented 9 months ago

To get us started on this, there are a couple of links I wanted to drop. I'm not an expert with any of these and haven't personally used them yet, but they're approaches that I've come across that I think might be worth exploring. Leave a comment with any more approaches that you think look promising!

ROS2 NAV2 Costmap While it's not a hard requirement, I think it would be very helpful to integrate with ROS2 NAV2's costmap capabilities. Basically, NAV2 contains a bunch of code that, given a point cloud (from something like 3D LiDAR or stereo camera) or depth map, will create a 3D voxelized reconstruction of the data that you pass in. Then, you can work with that voxel map in either 3D, or choose to just work with the projection of that voxel grid in 2D (which is essentially just working with that 2D costmap that we probably want). Having this part complete for us (in theory) already is huge if we can make it work--because that kind of code can be really hard to get right. NAV2 is a well-maintained, "publicly scrutinized" (Koopman) implementation, and works out of the box with many of NVIDIA's Isaac ROS nodes. Here are some helpful links for getting to know NAV2's costmap capabilities:

I still have to read a lot of that documentation myself, but in my experience, I think we will safe ourselves a lot of headache if we try to use these existing ROS2 packages.

NVIDIA Isaac ROS Freespace Detection https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_freespace_segmentation/index.html This is along the lines of what we want. I'm not sure if this particular ROS image actually lets us include/train on our own semantic classes (which we would likely need if we want to have construction type objects projected onto a particular costmap layer), but it's along the lines of what we want.

NVIDIA Isaac ROS Nvblox https://nvidia-isaac-ros.github.io/repositories_and_packages/isaac_ros_nvblox/index.html Chi knows I wanted to try this last semester with our F1TENTH car, but I feel like this may be another Isaac ROS package worth trying out for our project. Compared to the freespace detection model above, Nvblox should allows us to segment out objects of a particular class (like construction object, for example). This requires training a UNET semantic segmentation model separately (which I have done using their utility before--pain), but if it actually works, I (naively) think this could be a straightforward route. Update: Maybe the actual difference between the Isaac ROS Freespace Detection and NvBlox is that Nvblox handles generating the costmap from semantic segmentation + depth data, whereas the freespace segmentation node just seems to produce semantic-segmentation + depth data, but relies on the ROS2 NAV2 costmap 2D plugins to construct the costmap from the data. In this way, Nvblox may be the better choice for the simple reason that Nvblox can create that costmap in a really fast, efficient way using the GPU--whereas maybe the ROS NAV2 plugins are cpu-bound? Not sure about this, just a hunch.

SurroundOCC (github) (arxiv) This is a more end-to-end approach like what Tesla supposedly uses. This would be very cool to try, but I also don't think we need a full 3D reconstruction--and this implementation uses a lot of compute to create that from multiple camera views. While I don't think these methods run in real time (other than something like Nvblox, only a single camera), it could still be cool to try out.

BEVFormer (github) (arxiv) This was the original approach I had in mind, as it seemed like the most straightforward way for us to get a local costmap strictly from cameras. However, in CARLA, I'm pretty sure we will have access to all kinds of sensors, so I'm not really sure that's a valid constraint. Plus, I'm pretty sure this model is slow as hell, for lack of a better description.

BEVFusion (github) (arxiv) Raj presented this one in class. This is definitely one of the state of the art approaches that seems to actually work! AND, as an added bonus, it looks like it'll run in real time (they reported 25 FPS on an Orin). I haven't read the paper or looked very thoroughly at the github for this yet--but we would need to find out if their model can classify / assign semantic class to each object that it maps out, as we'll need that for our workzone boundary detection. It could be that their model just uses an object detection model or semantic segmentation model internally, and we may be able to train a model and "plug it into their model," so to speak. Not sure about that yet though.

nlitz88 commented 9 months ago

Leaving myself a note here to also take a look at Autoware packages. I already know they have nodes that do all kinds of preprocessing on sensor data and sensor fusion tasks (like matching detections from different sensors as the same instance), and I'm sure they also have nodes to turn this into a costmap.

For our purposes, these packages could also provide the kind of turnkey solution we're looking for in terms of costmap generation. It might be nice if they have a 3D object detection node, too--but could even implement one if not.

nlitz88 commented 9 months ago

https://developer.nvidia.com/blog/detecting-objects-in-point-clouds-using-ros-2-and-tao-pointpillars/

CMUBOB97 commented 9 months ago

Adding to this thread a monocular depth estimation I found: https://github.com/XXXVincent/MonoDepth2

nlitz88 commented 8 months ago

Working on spinning up BEVFusion now, after having found some good scenes within nuScenes that feature all kinds of construction barriers and objects.

nlitz88 commented 8 months ago

Found a link to a TensorRT based implementation provided by NVIDIA--might try this too if it's easier to set up and evaluate.

https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution/tree/master/CUDA-BEVFusion