raulmur / ORB_SLAM2

Real-Time SLAM for Monocular, Stereo and RGB-D Cameras, with Loop Detection and Relocalization Capabilities
Other
9.47k stars 4.7k forks source link

dense point reconstruction #220

Open callsty opened 7 years ago

callsty commented 7 years ago

Hi,

In most of videos, we can see dense point cloud reconstruction with orb slam 2. It's amazing! but I don't know how to process.

For example in the end of the video: https://www.youtube.com/watch?v=ufvPS5wJAx0

Have you any idea to realize this?

Thank you for your help!

AlejandroSilvestri commented 7 years ago

This question is related to autowarefoundation/autoware#196

callsty commented 7 years ago

Thank you for your response. The link is very interesting but it doesn't explain how to run orb-slam2 with semi dense reconstruction. It's seems not activated by default. Have you an idea?

AlejandroSilvestri commented 7 years ago

Orb-slam 2 doesn't do it.

Your video is an example of the paper Probabilistic Semi-Dense Mapping from Highly Accurate Feature-Based Monocular SLAM.

The software is based on orb-slam2, but is not available.

callsty commented 7 years ago

Ouch... Is there an other possibility? Otherwise, I need to use lsd-slam with its weaknesses... http://wp.doc.ic.ac.uk/thefutureofslam/wp-content/uploads/sites/93/2015/12/ICCV15_SLAMWS_RaulMur.pdf

AlejandroSilvestri commented 7 years ago

LSD-SLAM is very good actually. We can say it's ORB-SLAM main "competitor".

These two algorithm work on different bases, with its own weaknesses and strengths.

kerolex commented 7 years ago

Currently, there are four very good algorithms that can "compete" with each other: ORB-SLAM, LSD-SLAM, DSO (Direct Sparse Odometry), and SVO (Semi-direct Visual Odometry). Each of them addresses different characteristics: either feature-based or direct method, either dense or sparse reconstruction. Moreover, the first two are SLAM methods because they address the loop closure problem, whereas the last two are simply visual odometry algorithm. All of them are achieving impressive results with weaknesses and strenghts.

Please have also a look at: DSO: http://vision.in.tum.de/research/vslam/dso SVO 2.0: http://rpg.ifi.uzh.ch/docs/TRO16_forster_SVO.pdf

jasonwurzel commented 7 years ago

Hi, @kerolex I just wanted to revive this conversation: We are looking into several V(I)O algorithms for indoor tracking. the above mentioned systems, especially DSO and SVO are outputting "absolute" measurements, so length in cm or m? Or is this only possible if you integrate IMU measurements?

hashten commented 7 years ago

Hi @jasonwurzel It is not possible to output absolute measurements with only one monocular camera, without supervised calibration. One could think of an initialization similar to the one in PTAM, where the user helps the initialization with doing a small translation of the camera. I think PTAM assumes a movement of 10 cm. Having a stereo rig is also an alternative. Or to use an IMU like you say is also an alternative.

AlejandroSilvestri commented 7 years ago

@hashten is right! See autowarefoundation/autoware_ai#675

About IMU integration, taking absolute measurements is possible with IMU, Raúl Mur already did it in Visual inertial Monocular SLAM: https://www.youtube.com/watch?v=rdR5OR8egGI

and the paper https://arxiv.org/pdf/1610.05949.pdf

Code is no available yet.

callsty commented 7 years ago

I want all this code ^^ But according to this document http://wp.doc.ic.ac.uk/thefutureofslam/wp-content/uploads/sites/93/2015/12/ICCV15_SLAMWS_RaulMur.pdf

Monocular with orb slam2 are already awesome !!!

bouhmustapha commented 7 years ago

Hi all, Has someone did the profiling of one of the algorithms, ORB-SLAM2, DSO and SVO?

AlejandroSilvestri commented 7 years ago

I believe you mean DSO.

I didn't profile any of them, but I believe DSO will work faster an with less memory than ORB-SLAM2.

DSO isn't SLAM, it will use less memory (it doesn't have a map in memory) and I read it works faster than orb-slam. But may be ORB-SLAM performs faster in localization mode (i.e. tracking only) with a known static map in memory.

I can't say much about SVO, it's older, I think it was overcame .

M1234Thomas commented 7 years ago

Hello, Any idea if LSD SLAM can be integrated with ORB-SLAM2?

Many Thanks..

AlejandroSilvestri commented 7 years ago

@M1234Thomas , as a simplificated answer, I'd say NO, they cannot be integrated.

Of course they can, but there is no point in doing it. Until someone discover the benefit in some not evident way.

ORB.SLAM2 is featured based, and all of it algorithms are focused on this characteristic, from FAST, ORB, BoW, BA, etc.

LSD SLAM is direct and semidense, doesn't use FAST, ORB, BoW, nor has a map of points.

mslavescu commented 7 years ago

Which one is better in localization mode, loop closure detection?

Do you know of any open source work to integrate neural nets to initialize SLAM maps on small sections of the road?

I'm looking from SDC perspective to use it in ossdc.org to build 3D maps for localization, like MobileEye is doing it.

AlejandroSilvestri commented 7 years ago

@mslavescu , the only one thing I can say is that LSD-SLAM managed to run on a mobile phone - only in localization mode. I suppose that means LSD-SLAM in localization mode is lighter than ORB-SLAM.

Of course your have to compare map size in both techniques, in order to evaluate which one fits mobile hardware better.

I'm not sure why you are interesting specifically in initializing on neural networks. Feature-based SLAM like ORB-SLAM can use any method to triangulate the first set of points.

One problem in monocular SLAM initialization is that the map has no scale reference.

mslavescu commented 7 years ago

I assume you are referring to this Android demo: https://m.youtube.com/watch?t=22s&v=GnuQzP3gty4

I could use NN to do (rough) landmarks detection that I hope I can use for SLAM (re)initialization:

https://cars.stanford.edu/events/leveraging-deep-learning-create-3d-semantic-maps-autonomous-vehicles

Maybe something like PoseNet (not very precise, but it could be improved I assume):

https://m.youtube.com/watch?v=u0MVbL_RyPU

I think MobileEye is doing something like this in their 3D mapping approach.

This presentation also has some info: http://www.umiacs.umd.edu/~zhengyf/DeepLandmark_MICCAI15.pdf

I just found this, looks very interesting:

Semantic Mapping of Large-Scale Outdoor Scenes for Autonomous Off-Road Driving https://m.youtube.com/watch?v=52PrAeGsWyg

AlejandroSilvestri commented 7 years ago

@mslavescu , impressive.

Yes, that the Android demo I was talking about.

I believe with reinitialization you mean "relocalization" in ORB-SLAM jargon. ORB-SLAM uses BoW and features to relocalize (and to close loops). Right now I don't see how to mix different techniques. Of course you can relocalize and provide the pose to ORB-SLAM, but it lacks the set of map points observed relocalization must provide.

BTW, ORB-SLAM relocalization is fast too, it rely on a 1 million words (BoW) database, small enough to a mobile hardware, not even close to several Gigabytes mentioned in video.

Semantics can be build separately, but the performance could improve if you can build semantics from ORB and BoW ORB-SLAM already harvested.

mslavescu commented 7 years ago

Yes, I meant relocalization :-) or place+pose recognition.

MobileEye needs only 10KB per 200m of road, based on Amnon presentation, which can be updated in realtime over 3G network, if changes are required:

https://m.youtube.com/watch?v=n8T7A3wqH3Q

How easy is to build BoW, dynamically for segments of the road? And use GPS for segment selection.

Do you have any Android demo (APK if possible) with (re)localization?

I will use a (dual GPU) desktop in my car to test all of these in realtime, with mono/stereo/multi cameras SLAM would the point cloud be dense enough to do 3d recognition of cars, pedestrians etc? Generated with mono or stereo ORB-SLAM2 for example. I may also fuse CNN segmentation.

Any recommendation of (affordable) cameras for this kind of application?

AlejandroSilvestri commented 7 years ago

Well, I think ORB-SLAM doesn't exactly to your needs. It doesn't register a dense cloud, on the contrary, it make an extra effort to maintain as less point as possible.

Monocular SLAM need a stationary scene, it can't "see" moving cars. While you travel the moon goes with you, it "moves" as it is going along you. Monocular SLAM can't distinguish a car moving along with you (let say matching your speed) from the moon: both seem bodies at infinity.

For cars I only have seen lidars and stereo cameras as sensors. I can imagine a house robot with monocular camera, but not a car in public environment.

About DBoW2, the one used in ORB-SLAM: it is a point descriptors classification. It only works with "features" (2D points on image) descriptors.

mslavescu commented 7 years ago

Thanks for the explanation!

I will try to combine something like this:

Lazy Data Association For Image Sequences Matching Under Substantial Appearance Changes

https://www.youtube.com/watch?v=hVY2PCmTGIY

With something like this:

http://www0.cs.ucl.ac.uk/staff/R.Yu/video_popup/VideoPopup2.html

I had the impression that ORB-SLAM2 can be used to generate depth maps, also in Autoware I've seen this demo (see the videos in this comment and after, last video is the one I think it does that):

https://github.com/CPFL/Autoware/issues/572#issuecomment-276265696

This seem to be what I need, basically recognition and reconstruction (just found it):

https://fradelg.gitbooks.io/real-time-3d-reconstruction-from-monocular-video/content/notes/duality.html

One more question, have you tried Mono ORB_SLAM2 on game videos like this, what camera settings do I need to use?

Real time YOLO detection in OSSDC Simulator running TheCrew on PS4 30fps https://m.youtube.com/watch?v=ANgDlNfDoAQ

Need to that also for OSSDC PS3/PS4 Simulator, would be nice to reconstruct the road map (3D also).

ayushgaud commented 7 years ago

I have recently tried fusing REMODE with ORB_SLAM2 instead of default SVO (since it doesn't work well with rolling shutter camera) for dense reconstruction. You will find my code here https://github.com/ayushgaud/ORB_SLAM2 @callsty this might be similar to what you were looking for. I have compared this reconstruction with LSD_SLAM and in my case, this was performing better.
orb remode

mslavescu commented 7 years ago

@ayushgaud that is really cool!

Do you have some instructions how to reproduce your example?

Would your approach work on videos from games? Where we don't have camera calibration info, like The Crew on PS4 video I posted above?

I was looking also at ways to automatically calibrate in realtime the images, something like this, but for mono also:

Automatic Camera Re-Calibration for Robust Stereo Vision https://m.youtube.com/watch?v=2QGnOwfQKYo

Mono example here, but not generic, although may be good for OSSDC, where we mostly look at the road:

Automatic Camera Calibration for Traffic Understanding [BMVC 2014] https://m.youtube.com/watch?v=S3msCdn3fNM

Any suggestions how to do this would help us a lot, as we plan to use realtime game video input to test self driving car algorithms. The same method should also work with YouTube games or dash cameras recordings.

ayushgaud commented 7 years ago

It's a great idea @mslavescu , I remember recently discussing something similar with my friend for self-driving cars. Although I think camera focal length might change during gameplay which could be an issue while using algorithms like this.

You can very easily replicate the results. Just clone this fork https://github.com/ayushgaud/ORB_SLAM2 and build and run the ROS Mono node as usual. It will automatically publish the data in the format required by REMODE.
Now build the REMODE package and follow the instructions given here https://github.com/uzh-rpg/rpg_open_remode/wiki/Run-using-SVO the only change you will have to make is to remap "/svo/dense_input" to "/ORB/DenseInput" and provide it camera intrinsics and image topic.

That's it and you are good to go! (Hopefully)

And also, OSSDC looks like a great initiative to me. I would love to contribute to it whenever I have some spare time, Best of Luck!

fishcu commented 7 years ago

Thanks for this great piece of software!

This discussion seems to diverge from the original question quite a bit, so I wanted to ask again: Is there currently an implementation available for generating dense 3D maps using ORB_SLAM, when used with an RGB-D sensor?

@ayushgaud If I understand correctly, your fork will only utilize a monocular camera (in combination with REMODE), correct?

I found this modification: https://github.com/gaoxiang12/ORBSLAM2_with_pointcloud_map Has anyone tried it? Does it do what the OP and I am asking for?

ayushgaud commented 7 years ago

@fishcu You are absolutely correct. The idea of fusing REMODE and ORB_SLAM2 is that I wanted a good monocular reconstruction and my options were limited to LSD and REMODE. While the reconstruction using LSD looks good, in my case it was drifting a lot (without loop closure) and the default pipeline of REMODE uses SVO which didn't work properly on a rolling shutter camera (even DSO requires global shutter) hence I fused ORB_SALM2.

In the original video of ORB_SLAM2, they have also shown RGB-D reconstruction. If you are interested in that, you should definitely check this out https://www.youtube.com/watch?v=XySrhZpODYs code: https://github.com/mp3guy/ElasticFusion

fishcu commented 7 years ago

I have seen the video before. While ElasticFusion looks great, it also requires a beefy computer. Other solutions like RTAB-MAP will run even on a Tango tablet, and I used ORB_SLAM on an intel Atom processor before!

That's also why I am asking whether the dense map that is shown in the demo video of ORB_SLAM2 can be generated from the available source code, or if you have to do your own reprojections, etc. Is the code available?

ayushgaud commented 7 years ago

@mkorkmazeem try using the indigo branch

ayushgaud commented 7 years ago

@mkorkmazeem do git checkout indigo in the ORB_SLAM2 directory

ayushgaud commented 7 years ago

@mkorkmazeem Remove the old repository and try cloning it again and then switch to indigo branch.

ayushgaud commented 7 years ago

Just a request from you all, If you are able to get any sort of reconstruction using my fork it would be really helpful if you can post a screenshot of it here. Thanks in advance.

ayushgaud commented 7 years ago

@mkorkmazeem I think that probably your camera parameters might be incorrect in the launch file please recheck that. I have also recently tested my code on an actual camera and it works well. I was also getting bad results like yours initially but then I checked again and my camera parameters were incorrect. Also, I think the depth image you are getting looks fine although you can try and check it on a simple webcam or any dataset with known camera intrinsics. I hope it helps.

AlejandroSilvestri commented 7 years ago

@fishcu , no, unfortunately open code orb-slam2 doesn't have those capabilities.

AlejandroSilvestri commented 7 years ago

@ayushgaud , in a simulated world you won't have distortion. But you still must provide correct instrinsic parameters, say fx, fy; cx, cy.

When you make a video from a simulated 3D world, the virtual camera need those parameters, and ORB-SLAM2 needs them too.

cx and cy would be in the image center. fx = fy depends on camera's angle of vision, or zooming.

I myself never got a good calibration from opencv chessboard calibration app (specially distortion parameters), I "hand calibrated" them.

ayushgaud commented 7 years ago

Hi @mkorkmazeem and @AlejandroSilvestri,

I used the gazebo camera intrinsics with zero distortions and gave the parameters to both ORB and REMODE. I also have tested my code using a PS3 camera and on Parrot Bebop 2 which was calibrated used standard ROS camera calibration package.

These are the parameters I used while using gazebo for ORB SLAM:

Camera.fx: 375.98849178826106 Camera.fy: 375.98849178826106 Camera.cx: 428.5 Camera.cy: 240.5

Camera.k1: 0.0 Camera.k2: 0.0 Camera.p1: 0.0 Camera.p2: 0.0

and for REMODE my launch file looks like this:

`

` Ofcourse, the camera parameters might vary depending on how you initialize it in gazebo, but you can verify it from the camera_info topic.

AlejandroSilvestri commented 7 years ago

Hi @ayushgaud ,

I'm not familiar with Gazebo, I am only sharing some thoughts with you.

1) If your virtual world doesn't have enough texture, orb-slam2 may not describe accurate enough the features FAST found on image, thus impeding creation of enough 3D map points.

2) I believe Parrot Bebop 2 has a wide angular camera, which won't work with usual distorsion model: there are two ways:

a) reduce image so viewing angle es 90º. Many people do that, I never did it myself.

b) use fisheye distortion model (which uses equidistance projection distorsion model), not implemented in orb-slam2. I did it in my orb-slam2 modified code. I don't know why opencv cv::fisheye::undistortImage didn't work on my code, so instead of calling it, I copy its code. It's implemented in antidistorsionarProyeccionEquidistante.

ayushgaud commented 7 years ago

Hi @AlejandroSilvestri

Just to make things clearer, I use ORBSLAM just for calculating transformation of the camera (tracking only), hence reconstruction won't get affected based on the textures in the scene (although it might affect tracking accuracy, but with enough features, you won't face this issue either). The worst thing that could happen is tracking being lost. I believe what you are talking about is the sparse feature map which I am not using in this case.

As far as Parrot Bebop 2 is considered, I use bebop_autonomy package on ROS build using AR-SDK which publishes cropped and undistorted images, just like what you see on the mobile phone while operating (there is a virtual pan option but, still it will publish only the cropped image in real time) The full resolution images are stored onboard initially which are also undistorted so you don't have to worry about that. Apart from that, we calibrate the camera parameters on the cropped and undistorted image itself hence we don't face this issue.

mtee commented 7 years ago

@fishcu

Thanks for this great piece of software!

This discussion seems to diverge from the original question quite a bit, so I wanted to ask again: Is there currently an implementation available for generating dense 3D maps using ORB_SLAM, when used with an RGB-D sensor? I found this modification: https://github.com/gaoxiang12/ORBSLAM2_with_pointcloud_map Has anyone tried it? Does it do what the OP and I am asking for?

Yes, it works quite well. I'd even say it's comparable with the mesh you get with ElasticFusion. It is also a very simple implementation. If you understand how orb slam works, should have no problems understanding the dense model construction in this implementation. Stereo and monocular cameras are a different beast though and need way more work I suppose.

mkorkmazeem commented 7 years ago

@ayushgaud I'd like to ask lastly, if I want to add some labels about an environment, how can I mention the labels to the 3D dense map? For example, I'd like to mark the chair as a chair or desk as a desk etc., and I assume knowing the chair or the object, but I want to see that chair label on my 3d dense map. How can I achieve this, and which part should I add the code?

thx.

hjjayakrishnan commented 7 years ago

@ayushgaud @mkorkmazeem

Hi, I'm trying to run ORB_SLAM2 on ubuntu 16.04, ROS Kinetic. But while trying to rosrun ORB_SLAM2 RGBD Vocabulary/ORBvoc.txt Examples/RGB-D/TUM1.yaml

I'm getting

[rosrun] Couldn't find executable named RGBD below /home/odroid/ar_go_ws/src/ORB_SLAM2/Examples/ROS/ORB_SLAM2

It had compiled successfully without any errors. Any idea what might be happening? EDIT : I have a feeling this is a workspace issue. I have cloned ORB-SLAM2 to ar_go_ws/src/. All my other ROS packages are also there. I also performed a catkin_make at ar_go_ws/.

ayushgaud commented 7 years ago

Hi, @mkorkmazeem from what I understood you want to project object labels in 3D along with the point cloud. If that's correct you can calculate the mean position of the objects in 3D using the point cloud itself and project the labels. You can also try modifying the code of REMODE and publish the reference image along with its corresponding depth image so that you can filter out the ROI and subsequently object's depth. @hjjayakrishnan I think you must add export ROS_PACKAGE_PATH=${ROS_PACKAGE_PATH}:/home/odroid/ar_go_ws/src/ORB_SLAM2/Examples/ROS in your bashrc

VenkataRamanaS commented 7 years ago

Hi ,

In computepoint octtree function ( new ), the image is divided into 30x30 blocks and performed FAST on it. Why is it divided? what happens if not divided and perform FAST on the entire image at a time?

AlejandroSilvestri commented 7 years ago

@VenkataRamanaS ,

Orb-slam2 look for uniform distribution of features on image. Image is divided in a blocks to easily count the number of features found in each block. When a block has few features, FAST is repeated on that block with a less strict criteria in order to raise the number of features found.

You can set the grid size in the configuration file.

VenkataRamanaS commented 7 years ago

Thanks for the reply !!!

If not divided, and considered the entire image, how bad the tracking would be? , as our application demands.

AlejandroSilvestri commented 7 years ago

@VenkataRamanaS

It depends mainly on the scene. Pose optimization could be degraded. Drifting could increase.

laxnpander commented 7 years ago

@ayushgaud I read about your attempt to marry ORB SLAM 2 and REMODE. I would like to do similiar and have reached the point where both communicate as expected. However I am not experiencing valid reconstruction. Poses and camera calibration all seem right. Images are okay as well. However my convergence in remode is very low (<20%). Additionally the pointclouds look like a bowl (see below). I suspect low framerate (1 Hz) to be a problem for remode. Do you have any hints or experiences to share about that? Thanks in advance!

172191fa-3c07-11e7-91f5-9a2031fffdf1

bobyl573 commented 7 years ago

@ayushgaud @laxnpander I've been trying to get ORB SLAM2 to work with REMODE as well using @ayushgaud fork. Everything installs and compiles correctly, however, when I try to run it I get error:

(Mono:21872): Gtk-ERROR **: GTK+ 2.x symbols detected. Using GTK+ 2.x and GTK+ 3 in the same process is not supported

The original ORB SLAM2 doesn't seem to have any GTK dependents? and works correctly. Any help would be appreciated.

ayushgaud commented 7 years ago

@bobyl573 which version of ROS are you using? If it's Indigo then please use the Indigo branch. If you are using kinetic and still facing the issue then there might be an issue with your OpenCV or Pangolin installation. In which case you might want to remove gtk3 using sudo apt-get remove libgtk-3-dev and then rebuild the packages again. @laxnpander I am sorry for replying so late but I guess I missed your message. If you are still facing this issue you might want to check the frame convention you are using. It should be similar to standard camera convention which is X-Right Y-Down Z-Into the frame.

bobyl573 commented 7 years ago

@ayushgaud Thanks, got it working after a fresh install of ROS (was using kinetic), still not sure what was the underlying issue. Will continue playing with your code, very neat!

kevin-george commented 7 years ago

Hey @ayushgaud, I was trying to find a way to publish the output of the monocular SLAM over ROS as a pose message and I came across your code.

I tried using the changes you made to ros_mono.cc but it segfaults at
Line 110cv::Mat TWC = mpSLAM->mpTracker->mCurrentFrame.mTcw.inv();

The ROS build I have works with the TUM datasets and on live camera feed using the cv_camera ROS package, I just want to publish the output as a ROS Pose message.