Potential Depth-image creation

Gariben commented 10 years ago

I would like to see if I could create add the ability to create a depth image to the velodyne stack. I have a theoretical approach lined out:

(from my ros answers post)

I think if I can get the 'rotation' value from the 'row_block_t' (from rawdata.h) and then the 'laser_number' values (from rawdata.cc), I can make a matrix of size ROTATION_MAX_UNITS by SCANS_PER_BLOCK and then store the depth to each pixel (raw->blocks[i].rotation, j). However, I'm kind of unsure on how to access the values I need. I have the velodyne package in my workspace and the rawdata.cc, but I don't know how to use that information.

jack-oquin commented 10 years ago

Ref: http://answers.ros.org/question/153571/return-degree-of-origin-from-velodyne_points/

Gariben commented 10 years ago

That's me! I followed your instructions of pursuing it further here. How do I include the information from Rawdata.cc in my code? I know how to use Rawdata.h as a header but not Rawdata.cc

bricerebsamen commented 10 years ago

I think you should follow the example from RawData::unpack().

Gariben commented 10 years ago

Could you elaborate a bit?

jack-oquin commented 10 years ago

@Gariben: I was just posting the link for future reference, while it's still easy to find.

My initial suggestion is the same as @bricerebsamen, write another RawData::unpack() method that creates either a depth image or some convenient struct instead of a point cloud. Note that unpack is called once for each packet; there are about 260 packets per revolution.

There is some old code here that did something similar:

That may help you understand what the device provides. It was last updated in 2011, and does not handle all the Velodyne models. So, don't take it too literally.

Gariben commented 10 years ago

Is it as simple as modifying the file, and checking/debugging it with catkin?

jack-oquin commented 10 years ago

Nothing is ever that simple.

First, you need to add a new RawData::unpack() method, then write a new nodelet like cloud nodelet that uses it to create a DepthImage. I suppose a DepthImage is expected to use an optical frame of reference instead of the normal device-oriented frame. In this case, being a cylindrical image, I don't even know what the frame of reference even means or how someone would use it.

Note that if the device is moving and turning during the 100 ms or so of the scan, everything will be distorted. That is mainly why we dropped the data scan interface mentioned above.

Gariben commented 10 years ago

I see. So I'll want raw_packet_t to now return the laser number, right? For now, I would just like to focus on returning a matrix as described above.

Also do you have any good resources on creating nodelets? I'm only familiar with making packages. Is it significantly different?

jack-oquin commented 10 years ago

Be aware that the laser_number in that code is a hardware-assigned id that is a strange permutation of the actual order of the lasers. To convert that to what I call the "ring number", you need to use and understand velodyne_driver/ring_sequence.h.

In the velodyne/src/conversions directory there are several files. Look at cloud_nodelet.cc, and observe how it includes convert.h and invokes convert.cc to do all the work. The same code is invoked as a regular node in cloud_node.cc. You'll need to write several similar files.

I did mention that this will be a lot of work. The more I think about it, the less worthwhile it seems. What special application do you have that needs this representation?

Gariben commented 10 years ago

It's an idea me and my advisor have brewed up. I've been into Human detection and gesture recognition from only Lidar data. Following some papers, I've been looking at a Reading-->Segmentation-->Classification kind of model. Things seem to be complex in all directions for segmentation. One method suggested using a mesh and then spin images, another with some occupancy grids. We're trying to develop a more simplistic method based on breaking images down into grids where movement is occurring. He's really in love with the idea, and frankly, it perplexes me quite a bit too. I haven't seen much other stuff about it.

jack-oquin commented 10 years ago

OK, well good luck.

You might look at velodyne_height_map. It breaks the point cloud into cells in the XY plane, looking for Z values that differ by more than some configurable minimum.

Also, I believe PCL has some good methods for building voxel grids from a point cloud.

Gariben commented 10 years ago

I'll get working with this stuff Monday. I may come up with some questions before then. Thanks for your help so far.

Gariben commented 10 years ago

Hello again. So things seem to being going pretty well. I think I've appended the data I need.

1) I created an overlay an began editing rawdata.h and rawdata.cc

2) I added a float array to Rawdata() called depth_image of size [ROTATION_MAX_UNITS/100] by [SCANS_PER_BLOCK].

3) This implementation is probably not correct, but in the 'if (pointInRange(distance))' block, I added 'depth_map[raw->blocks[i].rotation/100][point.ring]=distance;' (assuming that I can just grab the ring this point belongs in, it's distance, and it's rotation. If the ring is not the correct one, I suppose I need to add the ring_sequence header and convert it). Seems to compile as well.

So, for testing purposes, how exactly do I publish this array? As opposed to printing it to the screen with iostream or something, I think it would be best to publish it. I'd like to publish it similarly to how the pointcloud itself is publish, but I can't really pinpoint where 'pc' gets published. Could you offer me any insight?

jack-oquin commented 10 years ago

https://github.com/ros-drivers/velodyne/blob/master/velodyne_pointcloud/src/conversions/convert.cc#L78

Gariben commented 10 years ago

So I actually have an image outputting but I don't know if it's what I exactly what I want yet. Do you know a way I can output a stream to the terminal when using roslaunch? I basically want to output what depth value is corresponding to what degree and what laser.

Gariben commented 10 years ago

Basically, I want my definition in rawdata.cc to output to the screen. I tried adding --screen whenever I run cloud_nodelet.launch, but it was no different.

trainman419 commented 10 years ago

Since nodelets are run in the nodelet manager process, I would try the --screen argument to the nodelet manager launch file, or add the output="screen" argument to the <node> tag.

Gariben commented 10 years ago

I already mentioned that the --screen argument doesn't change anything. I believe it has to do with the specific file I'm using.

Gariben commented 10 years ago

Nevermind, I found that you have to open the cloud in RVIZ for it to output the process with the --screen argument.

Gariben commented 10 years ago

That being said, how should this be iterating? I recall you saying that the I needed to figure out the laser rings, but as the progam iterates, it looks like: ... 86, 4 = -1.12006 86, 20 = -0.97132 86, 5 = -1.12422 86,21 = -0.065175 86,6 = -1.12004 ... Very clearly more than a few things wrong there, but I want to better some of the qualities. Should the lasers be in order?

Gariben commented 10 years ago

Update: So I have a 320x32 grayscale image coming out at least. It reacts to people walking too, haha. Obviously there is something wrong with my definition. What explains the horizontal and vertical skips? Is it because the velodyne isn't degree by degree? http://i.imgur.com/cry2pji.png

Here is my current definition of the array: http://i.imgur.com/A4qk0MD.png It also lets the cat out of the bag that I'm working on an intensity image as well. Certainly making some progress though!

jack-oquin commented 10 years ago

It is certainly not degree by degree.

The 32e returns approximately 700,000 points per second. At the normal 600 RPM setting, that is 70,000 points per revolution, or about 2200 for each laser, around 6 points per degree. That is why the data are reported in hundredths of a degree.

There are 384 points in each packet, so unpack() gets called about 1800 times per revolution. The driver actually figures 1808 for this device, but I doubt any integer number of packets represents exactly one revolution of 360 degrees.

I recommend reading some of the 32e docs.

Gariben commented 10 years ago

Do you think a better approach would be to make an array that accounts for these extra values? Are they in any way consistent?

Gariben commented 10 years ago

So tomorrow, I'll scale the array correctly. However, I was wondering the best way to index the laser ring. I felt like I tried all of the possible combinations, but nothing made sense. Does it actually fire from 1 to 32, or in some different order? I was unable to find it on that link.

bricerebsamen commented 10 years ago

the beams are not organized from top to bottom. You can parse the config file, get the vertical angle of each beam, and build a index to number correspondence table if you want to process them in order.

On Tue 06 May 2014 07:49:20 AM PDT, Gariben wrote:

So tomorrow, I'll scale the array correctly. However, I was wondering the best way to index the laser ring. I felt like I tried all of the possible combinations, but nothing made sense. Does it actually fire from 1 to 32, or in some different order? I was unable to find it on that link.

— Reply to this email directly or view it on GitHub https://github.com/ros-drivers/velodyne/issues/26#issuecomment-42312003.

jack-oquin commented 10 years ago

You don't have to do all that. The order is fixed.

Just use velodyne_driver/ring_sequence.h. It works for both 64e and 32e devices. The 32e numbers run from 0 to 31, so they only use the first half of the array.

For a given hardware laser_number, the corresponding ring will be:

velodyne::LASER_RING[laser_number]

Note that ring 0 is the lowest angle, which hits the ground closest to the device.

Gariben commented 10 years ago

Hello again. So I've made much more progress. : ) I have an image outputting, and my horizontal angle is correct, and I believe I have the the vertical figured out as well. We basically start with a 36000, with the laser defined as: int laser_total = SCANS_PER_BLOCK-1; velodyne::LASER_RING[laser_total-laser_number] in order to effectively 'flip' the image. However, we noticed some vertical discrepancies in the image.. Some pixels remain black. So I made it output the assignment to a text file, and saw this: DATA:(1)[17681, 31]=8.10928 DATA:(1)[17699, 17]=7.68193 DATA:(2)[17699, 11]=7.684 DATA:(3)[17699, 19]=7.68993 DATA:(4)[17699, 29]=8.07497 DATA:(5)[17699, 23]=8.07663 DATA:(6)[17699, 31]=8.0856 DATA:(1)[17715, 3]=7.67766 DATA:(2)[17715, 17]=7.65994 DATA:(3)[17715, 11]=7.684 DATA:(4)[17715, 19]=7.72392 DATA:(5)[17715, 29]=8.0571 DATA:(6)[17715, 23]=8.06672 DATA:(7)[17715, 31]=8.07771 DATA:(8)[17715, 25]=8.11522 DATA:(1)[17731, 1]=6.41016 DATA:(2)[17731, 3]=7.68165 DATA:(3)[17731, 17]=7.64794 DATA:(4)[17731, 11]=7.652 DATA:(5)[17731, 19]=7.72392 DATA:(6)[17731, 13]=8.05924 DATA:(7)[17731, 15]=7.71848 DATA:(8)[17731, 29]=8.0571 DATA:(9)[17731, 23]=8.08851 DATA:(10)[17731, 31]=8.07179 DATA:(11)[17731, 25]=8.11129 DATA:(1)[17748, 1]=6.36635 DATA:(2)[17748, 3]=7.66367 DATA:(3)[17748, 17]=7.64794 DATA:(4)[17748, 11]=7.646 DATA:(5)[17748, 19]=7.69593 DATA:(6)[17748, 13]=8.06124 DATA:(7)[17748, 15]=7.75234 DATA:(8)[17748, 29]=8.06107 DATA:(9)[17748, 23]=8.05286 DATA:(10)[17748, 31]=8.06784 DATA:(11)[17748, 25]=8.08377 Notice that not all lasers are fire for each degree rotation. I imagine this is intentional, and we're working with a solution, but I was wondering if you had any insight regarding this precision.

jack-oquin commented 10 years ago

One of the advantages of the point cloud representation is not having to worry about the exact laser firing pattern. I suspect that the actual order is model-dependent.

You should check the hardware manual for your device, but you will likely need to contact Velodyne for help with these details.

Gariben commented 10 years ago

Alright, so I'm at a point where I'm getting a reasonable image, though the image outputting is limited to smaller window. After making a log file to track the output, I found that the image seems to be processed for each packet, so it is much smaller than the full image. I'm trying to wake a work around so the image is all processed together.

Gariben commented 10 years ago

Here's an example of one of the packets processed:

DATA:[1, 1];(214.21, 11)=1.452 DATA:[1, 2];(214.21, 29)=1.43224 DATA:[1, 3];(214.21, 30)=1.4457

DATA:[2, 1];(214.36, 11)=1.45 DATA:[2, 2];(214.36, 29)=1.4382 DATA:[2, 3];(214.36, 30)=1.44179

DATA:[3, 1];(214.54, 11)=1.444 DATA:[3, 2];(214.54, 29)=1.43224 DATA:[3, 3];(214.54, 30)=1.43201

DATA:[4, 1];(214.7, 27)=1.45371 DATA:[4, 2];(214.7, 11)=1.436 DATA:[4, 3];(214.7, 29)=1.42628 DATA:[4, 4];(214.7, 30)=1.4281

DATA:[5, 1];(214.87, 27)=1.44183 DATA:[5, 2];(214.87, 11)=1.434 DATA:[5, 3];(214.87, 29)=1.42231 DATA:[5, 4];(214.87, 30)=1.42223

DATA:[6, 1];(215.04, 27)=1.42995 DATA:[6, 2];(215.04, 11)=1.424 DATA:[6, 3];(215.04, 29)=1.42231 DATA:[6, 4];(215.04, 30)=1.4144

DATA:[7, 1];(215.21, 27)=1.43391 DATA:[7, 2];(215.21, 11)=1.424 DATA:[7, 3];(215.21, 29)=1.41635 DATA:[7, 4];(215.21, 30)=1.42027

DATA:[8, 1];(215.36, 27)=1.42995 DATA:[8, 2];(215.36, 11)=1.428 DATA:[8, 3];(215.36, 14)=1.03432 DATA:[8, 4];(215.36, 29)=1.41238 DATA:[8, 5];(215.36, 30)=1.4281 DATA:[8, 6];(215.36, 31)=7.76984

DATA:[9, 1];(215.53, 27)=1.42995 DATA:[9, 2];(215.53, 11)=1.424 DATA:[9, 3];(215.53, 21)=1.44048 DATA:[9, 4];(215.53, 14)=1.03816 DATA:[9, 5];(215.53, 28)=1.15508 DATA:[9, 6];(215.53, 29)=1.42628 DATA:[9, 7];(215.53, 30)=1.42223 DATA:[9, 8];(215.53, 31)=7.77773

DATA:[10, 1];(215.7, 27)=1.42004 DATA:[10, 2];(215.7, 11)=1.424 DATA:[10, 3];(215.7, 21)=1.4345 DATA:[10, 4];(215.7, 14)=1.04201 DATA:[10, 5];(215.7, 28)=1.15314 DATA:[10, 6];(215.7, 29)=1.42231 DATA:[10, 7];(215.7, 22)=1.29417 DATA:[10, 8];(215.7, 30)=1.42418 DATA:[10, 9];(215.7, 31)=7.7797

DATA:[11, 1];(215.88, 27)=1.42599 DATA:[11, 2];(215.88, 11)=1.426 DATA:[11, 3];(215.88, 21)=1.43051 DATA:[11, 4];(215.88, 14)=1.04585 DATA:[11, 5];(215.88, 28)=1.15314 DATA:[11, 6];(215.88, 29)=1.41833 DATA:[11, 7];(215.88, 30)=1.42027 DATA:[11, 8];(215.88, 31)=7.78957 DATA:[11, 9];(215.88, 24)=1.41707 ------_END ROTATION. DEPTH IMAGE SHOULD BE PROCESSED._------

where the values are:

[degree_count_in_block, laser_count];(degree_value(/100), laser_no)=xy_distance

jack-oquin commented 10 years ago

I already mentioned that unpack is called once per packet:

There are 384 points in each packet, so unpack() gets called about 1800 times per revolution. The driver actually figures 1808 for this device, but I doubt any integer number of packets represents exactly one revolution of 360 degrees.

Gariben commented 10 years ago

On the positive, I'm close to getting it though. I was able to confirm that my lasers, degrees, and values are correct by making the image accumulate, and placing it in a static environment. Here are some samples: Full 360 classroom depth image Full 360 classroom reflection image

And A 90 degree interpretation of this wall: Depth image Reflection image

From this point we'll be working on methods to speed it up enough for real time capture. However, the packet system make it a bit of a challenge. I'll try to find a solution.

jack-oquin commented 10 years ago

Write a nodelet that unpacks every packet in an entire scan, combining them into a single image, sort of like this code does for point clouds.

By default, the driver publishes an entire 360 degree scan as a single VelodyneScan message, but you can control that with different driver parameters.

Gariben commented 10 years ago

Could you help me better understand the packet architecture? I understand how the VelodyneScan message works, but I don't understand some of the packet/block/scan variables in rawdata.h. All that I can confirm is that the packets->size() would appear to be the same as BLOCKS_PER_PACKET, as my data log resets each time it reaches 12.

jack-oquin commented 10 years ago

Yes.

Please read the header comments carefully, along with the device documentation. Then, tell me which specific questions are not answered clearly. That way we can improve the documentation for future users.

Gariben commented 10 years ago

Also, at what point does everything in rawdata.cc become the VelodyneScan message that is converted by Convert::processScan?

Gariben commented 10 years ago

Or should I only really be modifying processScan? That is the point in which my image is created. I've began to look at the values closely. The ROS message VelodyneScan isn't as transparent as the definition in rawdata.h. I now feel comfortable with all of the values in rawdata.h but the seemingly arbitrary value of '181' for the packets.size() confuses me.

Also, I removed where the data was being logged to a .txt and it runs much faster, really it's a matter of organizing the data, and then converting it to a ROS image message and then it should be available.

jack-oquin commented 10 years ago

Also, at what point does everything in rawdata.cc become the VelodyneScan message that is converted by Convert::processScan?

The velodyne_driver package reads packets from the device, storing each one in a VelodynePacket message along with its time of arrival. The driver collects ~npackets of them together and publishes all that in a single VelodyneScan message. The default value of ~npackets is enough packets for a complete revolution, which depends on the specified ~model.

jack-oquin commented 10 years ago

The driver's default ~npackets is computed here. For the 32E, the packet rate is 1808 packets per second. At the default 600 RPM turning rate, the frequency is 10 Hz. The driver rounds up to get enough packets for a complete revolution, which is 181 in this case.

jack-oquin commented 10 years ago

Or should I only really be modifying processScan?

The main reason to process each packet individually is to take time into account. If the device is moving, especially with a yaw component, the individual packets can have different transforms to the odom frame of reference. See the [Transform::processScan()](Or should I only really be modifying processScan?) callback for an example.

I think you are probably ignoring all those issues, so you should be able to get away with doing everything in your own processScan() callback.

Gariben commented 10 years ago

So basically, the explanation of why I'm getting only a part of the full image is that it's processing only about 181 packets (for my HDL-32), where as the amount of packets in the entire image is SCANS_PER_BLOCK_BLOCKS_PER_PACKET_PACKETS_PER_REV=99,840/100(for frequency)=999?

jack-oquin commented 10 years ago

No. For your device, the number of packets per revolution is 181. PACKETS_PER_REV should be removed from the header, because 260 was only correct for the old 64E model. To write model-independent code, use packets->size() from the VelodyneScan message.

SCANS_PER_BLOCK*BLOCKS_PER_PACKET tells you how many points are in each packet. So the 32E produces approximately 70,000 points per revolution (69,504 to the nearest block).

Gariben commented 10 years ago

Okay, so reconfiguring that, A block is a single degree value with 32 laser values. A packet is 12 blocks. The HDL-32e processes about 181 blocks every rotation, which is then processed into an image. This is about 2172 of the 36000 (possible) values, so only like 16.5% of the full image.

But sometimes the image processed in ProcessScan seems a little more than that. How would you advise I modify ProcessScan to interpret the full image of packets (36000/12=3000?) as opposed to only one revolution of packets.

Gariben commented 10 years ago

My idea is to have maybe make an array of 17 revolutions worth of packets, and then process them all at once and save them. I say 17, because we can that 1 revolution is 181 packets. If we divide the about 3000 packets by 181, we get about 16.5, so we should round to seventeen. For my remainder of time at the lab today I'll try to double-check that idea.

Gariben commented 10 years ago

I've been modifying ProcessScan, but I notice it's super delicate, and almost any changes I make break it. What has to happen for the node not to error out?

jack-oquin commented 10 years ago

I've been modifying ProcessScan, but I notice it's super delicate, and almost any changes I make break it. What has to happen for the node not to error out?

Without knowing what you did or what happened, there is no way for me to know. You'll need to debug your code yourself.

Given that it's written in C++, the likely causes are incorrect memory allocation, or out-of-bounds array addressing. Knowing the exact error should give a clue.

Gariben commented 10 years ago

As a quick solution, I tried making it so that the sensor writes, but does not display, for a given iteration. I found though, that as I increased that iteration the image became of better quality, but much, much slower. Thankfully the pointcloud is unaffected. Here's a framework of what happens: 1) ProcessScan is called, if this is the first iteration, the image is reset to all zeroes. 2) Data is written to the image. 3) If it is the maximum iteration, then the image is written, and the iteration reset.

Gariben commented 10 years ago

I have now have working, real-time images in the form of ROS topics. http://imgur.com/a/1Pwls

I'll have a video demonstration available by tomorrow. The only concern remaining is some of the vertical lines that appear on the screen in each frame. All things considered, pretty happy with the product.

Gariben commented 10 years ago

Update: I've made another node that completes the same task. This one is external, so segmentation can be applied before the image is processed. It does however require that that Vpoint type displays the degree and depth values in addition to the other information.

I'm of the opinion that that information should be added permanently to the Vpoint type, as it's super useful and doesn't appear to cause any slowdown. What do you think of this?

jack-oquin commented 10 years ago

I am reluctant to add more data to every point. There are a lot of them.

Since a PointCloud2 is currently aligned to quad-word boundaries, it is possible that the actual data stream bandwidth would not be affected, but adding specialized information to every point is still not appealing. The information you propose adding is useless once the point cloud is transformed into an odometric frame, as it must be when the device is moving.

Not having seen your code, I do not understand clearly what your "other node" is doing. Are you converting point cloud output back into a depth image? Why not produce the depth image directly from the data packets, as we have discussed all along? That can be done in a nodelet belonging to a separate package, which would depend on the standard velodyne_driver and velodyne_pointcloud packages. I see no need to add extra information to every VPoint.

Perhaps I am guessing wrong about your new solution. If so, please explain further.

ros-drivers / velodyne

Potential Depth-image creation #26