Data Storage and Preprocessing Inquiry for 3D Lidar Data in PyTorch

primacaredataset commented 11 months ago

Hi,

I have a question regarding the utilization of 3D Lidar data in a PyTorch deep learning framework. I apologize if this question has been asked before; if so, please guide me to the relevant resources.

I recently collected a dataset using the Ouster OS1 through ROS. Each time step yields around 6.3 million data points. Given the 20 Hz sample rate and hours of data collection, managing this volume of data is challenging. My objective is to input these 6.3 million data points at each time step into a deep learning model for a classification task.

My primary concern is how to effectively save the data from the rosbags, including the 6.3 million points per time step, for seamless integration with a deep learning model. Could you provide recommendations on the appropriate format for data storage and any general advice on how to proceed?

Additionally, I am interested in exploring existing codes or sample GitHub repositories that have applied similar machine learning or deep learning techniques to Ouster Lidar data. Do you have any recommendations in this regard?

Furthermore, I have another question. If I want to consider only a specific subset of the data, such as focusing on beams between 20 and 50 and between 360 degrees, is there any existing code that allows me to extract these specific parts from the entire 6.3 million data points?

Thank you in advance for your assistance.

Platform (please complete the following information):

Ouster Sensor? OS-1
ROS version/distro? Melodic
Operating System? Linux

Samahu commented 11 months ago

Hi @primacaredataset too many things to unpack.

My primary concern is how to effectively save the data from the rosbags, including the 6.3 million points per time step, for seamless integration with a deep learning model. Could you provide recommendations on the appropriate format for data storage and any general advice on how to proceed?

I think one key aspect to consider is to store ouster data in its raw form rather than storing the unpacked PointCloud messages. so consider saving the topics /ouster/lidar_packets and /ouster/imu_packets the size of the data in these topics would be significantly less and more efficient than storing the /ouster/points topic(s). yet you would still be able to generate the point clouds from these topics. refer to record.launch and replay.launch files for details.

Additionally, I am interested in exploring existing codes or sample GitHub repositories that have applied similar machine learning or deep learning techniques to Ouster Lidar data. Do you have any recommendations in this regard?

I haven't explored that myself so I don't think I could provide a good recommendation but I am pretty sure there are plenty of software that successfully integrated and trained against data from ouster sensor. We do have one demo/example that shows how to apply Yolov5 model to perform tracking using sensor data. This a 2D example which isn't probably what you are looking for but it falls in the same category. For the 3D detection and tracking the mmdetection is popular and a good option.

Furthermore, I have another question. If I want to consider only a specific subset of the data, such as focusing on beams between 20 and 50 and between 360 degrees, is there any existing code that allows me to extract these specific parts from the entire 6.3 million data points?

Horizontally, you could limit your beam through the Azimuth Window through the sensor configuration Vertically, you could do one of two things either reduce the vertical resolution which is something I have provided a prototype for; this code simply skips every 2nd beam which will reduce the amount of point the sensor is generating. Alternatively, you could crop the vertical resolution similar to how the azimuth window sensor configuration parameter, I don't have an example for that but the PR I linked above provide a good starting point.

Hope this helps!

primacaredataset commented 11 months ago

Hi @Samahu,

Thank you for your comprehensive response.

I successfully utilized the /ouster/lidar_packets topic to save the data, significantly reducing the dataset size. I've completed the saving process and am currently focused on data processing. Now that I have the saved files, I'm looking to extract the points data from the /ouster/points topic by running the record.launch command. However, I'm uncertain about the exact steps to extract the x, y, z data from the 6.5 million numbers at each time step. Do you have any guidance on how I can extract this data and possibly save it in a .txt or .csv file?

I tried to use the explanations provided by @zuzi-m here:

https://github.com/ouster-lidar/ouster_example/issues/94

But it seems that you have changed the files and for example, point_os1.h doesn't exist anymore.

Additionally, regarding the extraction of a portion of the point cloud, such as a specific number of channels or a particular range of horizontal azimuth, I'm eager to understand how to extract part of the data from these 6.5 million data points (the data is saved in the rosbag format). For instance, dividing 6.5 million by 365 could provide the data for angles between 0 and 1, or dividing by 128 could yield the data for one channel (or something like this)????? I'd like to know if there are any instructions on how the data in all these 128 channels and 365 degrees is distributed over the 6.5 million data points.

Thank you, and best regards.

Samahu commented 11 months ago

Once you saved the raw packets in the bag file you could extract the point cloud through replay.launch then process the generated /ouster/points topic. You could process the topic in a separate ROS node written either in C++/Python and save it to an csv or txt file. BTW, if you only care about the xyz data and your pipeline doesn't need the extra layers or field then consider setting the point_type parameter to xyz instead of the default original this would significantly cut down the size of the data you need to save, you can read more about this feature here

Samahu commented 11 months ago

For instance, dividing 6.5 million by 365 could provide the data for angles between 0 and 1, or dividing by 128 could yield the data for one channel (or something like this)?

This depends on the sensor type that you are using. Please refer to the sensor documentation.

primacaredataset commented 11 months ago

Hi @Samahu,

I have successfully extracted XYZ data from the ouster/points topic and saved them in HD5 format. However, I am facing a challenge. The Lidar data is collected at 20 Hz, but due to the point cloud size at each timestamp, the process of saving it takes longer than the 50-millisecond interval between messages. Consequently, I am missing a significant portion of the data, with approximately 60% being lost during the saving process.

Is there a way to manage this and ensure that I can save all the data without missing any?

ouster-lidar / ouster-ros

Data Storage and Preprocessing Inquiry for 3D Lidar Data in PyTorch #276