Decoupling Feature Extraction from Rest of Pipeline

JonasFrey96 commented 1 year ago

Implement separate node with synced in 3 image topics to have BS 3
3 x Segmentation Mask with ID of segment published as uint8
Feature per segment float array

Maybe also run the small MLP which weights are synced from our WVN learning code allowing for low latency.

wild_visual_navigation_ros

Responsibility: training and graph things

MLP <- segment masks, features -> weights

wild_visual_navigation_runtime

Responsibility: Feature Extraction and Output publishing

<- 3x camera images <- MLP weights and threshold -> segment masks, features -> traversability, confidence output

JonasFrey96 commented 1 year ago

This may be the highest increase in performance we can achieve. Create a demo Python node in which we just run DINO nothing else and measure the throughput. This would simulate the maximum inference we can achieve independent of all learning things.

JonasFrey96 commented 1 year ago

Did a first experiment. Wrote a feature extraction node which is capable to run at 22Hz on my laptop. The forward pass on the left takes roughly 30ms of the callback function the output frequency when just publishing a single float value to measure the performance of the node is a stable 22.5Hz. The input frequency of the images are set to 115 Hz and currently compressed images.

JonasFrey96 commented 1 year ago

Interestingly the bottleneck is moving the image onto the GPU and resizing it. Could we reconfigure the Alphasense driver to publish a downscaled version of the image: If we integrate this change we can easily run all 3 cameras if we split into a hot path feature extraction node and learning node.

mmattamala commented 1 year ago

That's interesting and looks really promising. Is the code somewhere I can take a look?

A few comments:

The resizing is done on the GPU then, right? Is the input size the high resolution alphasense? Does this consider the full conversion from std_msgs/Image to torch.tensor on the GPU?
I guess this is not publishing the message yet, is it? If it's not, there will be an additional important bottleneck converting the 90x224x224 features back to the CPU and a custom ROS message.
Why are the input images being published at 115 Hz? It the bag speeded up?
Still, if the Orin potentially takes the same time, it shouldn't be a big problem given that the images are published at 20Hz maximum (we are even using 15 or 10 Hz if I'm not mistaken). So it could be still alright.
Lastly, just a note to write somewhere: By splitting the node into 2, we will allocate 1GB in the GPU by default (500MB per node) due to Torch's memory allocation strategy. Not really a problem for the Orin, but for more limited platforms (like my laptop, haha)

JonasFrey96 commented 1 year ago

I will push it to a branch inheriting from devel in a moment. This considers the full resolution of std_msgs/Image which is 1080 x 1440 this we then at first have to convert to opencv then to torch (GPU) and then we rescale to (224,224).

I will now try to first rescale and then move to GPU/Torch.
The bag is speed up. I wanted to test what happens with the throughput if we overload the node.
We are using 10Hz - yes the orin should take the same feature extraction time.
Yes fully right we have to load torch twice but Maurice should just buy you a new laptop :)

Okay so now the idea would be:

Feature extraction node:

Input

Input Image
Input MLP weights send by the learning node

Output:

Traversability Image
Confidence Image (if used)
Features Nx90 (here N is currently around 100)
Segmentation mask (224,244, int32) which associates the N segments to pixels within the image. (Here I would like to support providing a -1 segment that is not used. This allows the segmentation mask to not cover the full image but potentially just random individual pixels.

Learning Node

Input:

Proprioception to create the supervision graph as we had Extracted Features and Segmentation Mask (synchronized, or here this could be CostumeMessage consisting of the MultiArray + Int32 Image )

Output:

Visualization of the path MLP weights

JonasFrey96 commented 1 year ago

Code under development here: https://github.com/leggedrobotics/wild_visual_navigation/tree/dev/two_node_solution

TODOs

[] Check if the segments -1 is compatible with the olde wild_visual_navigation_node code everywhere
[] Check if the segment image as int32 is working correctly
[] Write the learning loop node

JonasFrey96 commented 6 months ago

done

leggedrobotics / wild_visual_navigation