laxnpander / OpenREALM

OpenREALM is a pipeline for real-time aerial mapping utilizing visual SLAM and 3D reconstruction frameworks.
GNU Lesser General Public License v2.1
453 stars 117 forks source link

Optimizing flight plan and image parameters #30

Closed annesteenbeek closed 3 years ago

annesteenbeek commented 3 years ago

Hey, I'm starting to get familiar with this package and looks very promising for my use case. I'm testing it out now on an old dataset. However I'm noticing "poor" performance compared to the benchmark dataset. I think this is mainly due to the FPS, images in my dataset were captured at roughly 0.3FPS where the benchmark is at about 10FPS.

Do you have any tips on what the optimal image capture parameters are, mainly concerning overlap/FPS?

As in my application the images are streamed to the laptop during flight, 10FPS high resolution images would limit my real-time performance. Are the images in the benchmark dataset converted from video, with GPS exif data added later?

I'm just reading through the code, so not that familiar yet, but would it also support a video stream for example?

Btw, great work here 👍

laxnpander commented 3 years ago

@annesteenbeek: Hey hey, glad you like it! Your guess is probably right. FPS has a significant impact on the stability of the pose estimation and also on the final accuracy. You could fly slower or higher, which would have the same effect as high FPS, but you will have to fly very slowly/high to compensate such low frame rate. I have no quantitative analysis on the relationship between frame rate, image resolution, image quality, altitude and camera velocity. It will be quite complex and very dependent on the hardware and framework used. So it's more an experience based opinion I can offer.

We had the framework running for a research project at roundabout ~2 FPS at typical multicopter speeds and ~100m altitude. It was more error prone to misalignments than the benchmark dataset, but it was stable and reliable. But I wouldn't go lower than that. Camera quality is another big thing. DJI is the easiest way to acquire aerial data, but the cameras are not ideal for real-time mapping. If you have a custom setup, which I would advice, you should go for a global shutter camera with low distortion lens. You can follow any advice, which is given for visual SLAM frameworks as well. They apply here as well. Only that we appreciate higher resolution much more than the typical SLAM application. Again, it will be a tradeoff between high resolution, frame rate, processing power, bandwidth, lens distortion and so on. Maybe look out for global shutter first and then all other parameters. I am pretty sure it has a huge impact on the results.

Jeah, the data transfer is tricky. We used a good adhoc wifi network for communication, but still it was not ideal. I hope 5G will improve the usability in the future, but until then we are stuck with what we have. A good practice would be to run the pose estimation stage onboard the drone, if you have an onboard computer. Because we use ORB SLAM 2 as estimator, only keyframes are further published for the next stage. So consequently the pose estimation is fed with 10 FPS, but outputs typically only ~2 FPS. This reduces your required bandwidth by 80%, while maintaining a stable processing.

The benchmark dataset was acquired with a custom ROS node, reading the camera frame by frame and directly saving it to .png with exif tags. I don't see a benefit in using video streams, though it would be possible to implement something like this. However, keep in mind that you have to feed the GPS position as well somehow.

Let me know if you have any followup questions, I am glad to help! Sorry it took a while to respond, Sometimes I don't realize there is a new issue.

Best regards, Alex

annesteenbeek commented 3 years ago

Thanks for the extensive reply.

I'm using DJI drones with a custom app to livestream images over ad-hoc wifi to my laptop. I'm now updating the app to also stream the video feed to the laptop, and publish that video stream to ROS. This does cause some issues with syncing the video and GPS timestamps, as there sometimes is a few seconds delay, and I haven't found a solution to measuring the delay. Once I have tested i'll see how much problems this causes.

As my goal is generate a DSM during a flight which can then be used for additional flight planning, quality (as in resolution) is not as important as robustness and reliability.

Greetings, Anne

laxnpander commented 3 years ago

@annesteenbeek: Sounds interesting! I assume you grab the HDMI stream then? Chances are that the compression will create random artifacts, that will reduce quality of the feature matching of the SLAM. Hopefully, the increased frame rate makes up for that.

Btw. note that OpenREALM does not require every frame to have a unique GPS position. You can have multiple frames stamped with the same position. For the transformation from visual to geographic coordinate system I ignore the ones that are redundant.

annesteenbeek commented 3 years ago

The dji sdk provides an RTMP stream, I can read this stream straight into ROS or vlc. I use socketio to stream GPS and other commands. I haven't had issues with artifacts yet. I've seen ground up implementations of video streaming, where you should have more control over detecting artifacts, buffering etc.. But I hope to avoid this. https://github.com/The1only/rosettadrone (dji qgroundcontrol/ros link, might be useful if you ever want to use DJI drones with OpenRealm)

By redundant you mean, only used on keyframes? Or only the ones with unique GPS positions?

The goal would be to use the DSM and position of detected cars to plan missions to take images of license plates for one implementation, and another implementation that adds some semantic segmentation for real time damage assessment in disaster situations.

laxnpander commented 3 years ago

@annesteenbeek Looks pretty neat! We are using a custom ground station communicating with the onboard sdk right now. But it's always good to know what's out there.

Only keyframes will leave the pose estimation. But you can feed it with frames that have the same GPS stamp. So not all input images for OpenREALM need a unique GPS position. That's what I wanted to say :)

Sounds very interesting! I'd not rely on the DSM too much to be honest, the overall quality is not yet ideal for most use cases. There is still some work to do. But you will see if it's enough or not. Semantic segmentation in the RGB images and processing them for the global map should be very doable!

laxnpander commented 3 years ago

I will close this for now. Reopen if you have any more problems arising.