mihaibujanca / dynamicfusion

Implementation of Newcombe et al. CVPR 2015 DynamicFusion paper
BSD 3-Clause "New" or "Revised" License
398 stars 105 forks source link

Sample Umbrella not well reconstructed #18

Open electro-logic opened 7 years ago

electro-logic commented 7 years ago

Hello Mihai,

I have tried your project but I get a very bad reconstructed umbrella (bundled sample), very far from DynamicFusion video of the paper. Is this project still a work in progress and not yet ready or maybe I have made some mistake?

Thank you

mihaibujanca commented 7 years ago

Hey @electro-logic, the project is still in works.

I've pretty much got a version working on the CPU, but it's way too slow (10+ min / frame), so I'm trying to get something working on the GPU.

I don't have a timeline for it yet but working on it consistently so I'll ping this back when it's working reasonably well :).

electro-logic commented 7 years ago

Thank you for your answer @mihaibujanca , could you share some details about how to use CPU instead of GPU in your project? If CPU implementation is working well I can help you to speed up implementation.

mihaibujanca commented 7 years ago

Please checkout the gpu_optimisation branch, the addition there is calling getWarp().energy_data inside kinfu.cpp.

The main reason master doesn't reconstruct well is that the warp field estimation step is not called there (check equations 6 and 7 in the paper). The main problem is that I'm currently using Ceres for that optimisation and that only seems to support CPU. The optimisation requires estimating 6 variables per warp field node - the warp field is currently initialised with the first frame, so that means about 6 * 250k = 1.5M variables to be computed per each frame (and the warp field needs to grow over time).

One way of speeding it up to begin with would be subsampling from the first frame and creating a more sparse warp field, which I'll need to do anyway. But in the end, a GPU implementation will be necessary for optimisation.

I'm currently looking at http://github.com/niessner/opt and http://docs.nvidia.com/cuda/cusolver for GPU optimisation, but it will take me a bit to learn how to use them

mihaibujanca commented 7 years ago

@electro-logic Due to the optimisation being too slow, the way I tested whether or not it's working is:

  1. There's a target called ceres_test, which creates a warp field with 9 nodes and a set of 6 points for "live frame" and 6 for "canonical frame". You can play around with the values both for the nodes and for the input / output vertices. If you want to see how the optimisation parameters change, uncomment the last few lines in WarpField::energy_data https://github.com/mihaibujanca/dynamicfusion/blob/acd0c0e01f13bbeaaafee89b626d4a3856acd6eb/kfusion/src/warp_field.cpp#L173

I wouldn't recommend leaving that in if you run a dataset such as umbrella.

  1. With the umbrella dataset I initialised the warp field with the first frame and looked at the cost function value which did get small enough to make sense that the warp would work well - but it's kinda hard to visualise at this point.
electro-logic commented 7 years ago

I have compiled gpu_optimisation branch with umbrella dataset (only frames from 100 to 150) and I got this output for the first frame:

Device 0:  "GeForce GT 750M"  4039Mb, sm_30, 384 cores, Driver/Runtime ver.8.0/7.50
iter      cost      cost_change  |gradient|   |step|    tr_ratio  tr_radius  ls_iter  iter_time  total_time
   0  3.224206e+06    0.00e+00    4.70e+02   0.00e+00   0.00e+00  1.00e+04        0    5.47e+01    5.55e+01
   1  5.038094e-04    3.22e+06    5.88e-03   2.36e+02   1.00e+00  3.00e+04        1    1.35e+02    1.91e+02
   2  4.087742e-10    5.04e-04    6.66e-08   4.07e-03   1.00e+00  9.00e+04        1    1.18e+02    3.09e+02
   3  6.638671e-11    3.42e-10    1.41e-08   1.42e-03   1.00e+00  2.70e+05        1    1.35e+02    4.44e+02
   4  1.079882e-11    5.56e-11    5.19e-09   1.04e-03   1.00e+00  8.10e+05        1    1.23e+02    5.67e+02
   5  1.664163e-12    9.13e-12    1.25e-09   6.85e-04   1.00e+00  2.43e+06        1    1.44e+02    7.11e+02
   6  5.191461e-13    1.15e-12    2.46e-10   4.11e-04   1.00e+00  7.29e+06        1    1.65e+02    8.76e+02
   7  2.560969e-13    2.63e-13    9.02e-11   4.27e-04   1.00e+00  2.19e+07        1    1.44e+02    1.02e+03

Solver Summary (v 1.13.0-eigen-(3.3.4)-lapack-suitesparse-(4.4.6)-cxsparse-(3.1.4)-openmp)

                                     Original                  Reduced
Parameter blocks                        45850                    45850
Parameters                             275100                   275100
Residual blocks                         96020                    96020
Residual                               288060                   288060

Minimizer                        TRUST_REGION

Sparse linear algebra library    SUITE_SPARSE
Trust region strategy     LEVENBERG_MARQUARDT

                                        Given                     Used
Linear solver                    SPARSE_SCHUR             SPARSE_SCHUR
Threads                                     8                        8
Linear solver threads                       8                        8
Linear solver ordering              AUTOMATIC               4426,41424
Schur structure                         3,6,6                    d,d,d

Cost:
Initial                          3.224206e+06
Final                            2.560969e-13
Change                           3.224206e+06

Minimizer iterations                        8
Successful steps                            8
Unsuccessful steps                          0

Time (in seconds):
Preprocessor                           0.7368

  Residual evaluation                  0.5212
  Jacobian evaluation                375.9973
  Linear solver                      642.0986
Minimizer                           1019.6177

Postprocessor                          0.0095
Total                               1020.3640

Termination:                      CONVERGENCE (Gradient tolerance reached. Gradient max norm: 9.021969e-11 <= 1.000000e-10)

but GUI is frozen and I can't see anything.

mihaibujanca commented 7 years ago

@electro-logic yep, that's where I'm at right now. The cost being low enough (2.560969e-13) and the test on handmade data suggests that it should be working correctly. I'm not sure why the gui freezes, but I'm expecting that if I made the optimisation work on GPU, it would be easier to understand.

I might try subsampling the for the warp field initialisation tonight since that's easy to do and it might be enough to give some decent results for warping.

mihaibujanca commented 7 years ago

Quick update on this - I tried uniformly subsampling for the warp field initialisation and depending on the subsampling size, it seems to be working on a few (<5) frames, before optimisation fails due to loads of nans.

I'm gonna be working on porting this to GPU since right now it's too slow to visualise or debug properly

electro-logic commented 7 years ago

I have investigated, GUI frozen because all work is done in a single thread. It is better to do work in a separate thread and keep the main thread free so that it can process message queue and be responsive. This can involve using boost:thread or c++ 11 async features.

At this stage is too early to do this, before is better to improve the algorithm and then handling this stuff.

Maybe with the work on the GPU, the main thread can be free and responsive without additional work.

Anyway a quick and dirty fix can be calling viz.spin() and viz1.spin() before the call to dynamic_fusion(depthdevice) to give the user a way to view the data. Then the user can press Q to proceed with the next frame. This can work now for debugging purposes because every frame is very long to calculate.

Just for curiosity: What OS / GPU / CUDA toolkit version are you using for development?

mihaibujanca commented 7 years ago

My setup is Ubuntu 16.04 / Nvidia GeForce 960M 4GB / CUDA 8.0. Yeah I could spin up another thread but it won't do much since all the user would be doing is see the same image / pointcloud until a new frame is processed - maybe look around at the pointcloud, but I can't see any obvious advantage in doing that at this point.

eshafeeqe commented 7 years ago

@mihaibujanca awesome project. I am also doing the same for the last one year. I am currently using Opt for optimization. It can improve the overall optimization time but in my experience, dynamic fusion type subsampled graph based optimization required some more modification over Opt. (currently I am doing full mesh optimization) I tried volumedeform( https://arxiv.org/abs/1603.08161) optimization also using Opt, but unfortunately that significantly increases the time for a TSDF grid size of (128x128x128). So in my opinion Opt can bring down the time about 2 secs, but for making work in real time really required hand crafted Jacobian estimation.

mihaibujanca commented 7 years ago

@eshafeeqe Thanks a lot for the feedback!

I'm currently working on getting it to work with Opt, but there are plenty of other things that need to be improved, in all fairness. I'm working on getting this to be part of OpenCV as well, so eventually the code will need to run in real time. I thought about building VolumeDeform instead, but it seemed like DynamicFusion would be an easier option to begin with and VolumeDeform could be tried later.

Would love to chat about this and any tips or contributions would be more than welcome :).

I'm not that great with cuda so getting the optimisation to be fast is taking me a while

electro-logic commented 7 years ago

I think that also if real-time can not be reached with current hardware, in the future mainstream GPU will improve and so if we get 0.5 fps today this is a great reach. Still allow for offline processing, where a 1 min sequence taken at 30 fps (1800 frames) can be processed into 1h.

KevinLee752 commented 7 years ago

I'm a newcomer of 3D-reconsruction and also implementing my idea based on this code project (what an awesome project I must say). Out of curiosity, how is this project going now? Is the reconstructed video closed to that in DynamicFusion paper? Actually, I tried this project on Windows VS2013 but get an error (bad_alloc) about the problem solving process in WarpField::energy_data . I'm using Ceres with EigenSparse instead of SuiteSparse and still trying to figure it out ....

mihaibujanca commented 7 years ago

@KevinLee752 Thanks! Curious to know more about what you're thinking of doing.

I'm working on it a few hours every day and it's a serious priority for me - but at the same time I have other commitments and time is limited.

I am currently working on a version that uses https://github.com/niessner/Opt for the optimisation instead of Ceres - since Ceres is CPU only. Seeing that a few people are taking interest in this I'll probably focus on getting the documentation up to date and building the project to be easier (there are some hardcoded paths and other issues I didn't get to address yet).

In terms of where the project is, I'd just say active development and don't really have an ETA for it. The reconstruction is currently failing because of some issue in my warp field formulation and I'm trying to write tests and solve it. It's probably still a few weeks away from being close to the paper results (especially in terms of being a real time system), but I'm hoping that by mid-November the reconstruction will look decent while having a reasonable speed for offline processing

Wangyouai commented 3 years ago

@mihaibujanca Hi Mihai, I recently ran this project, and the umbrella reconstruction result obtained is very rough, not as smooth and detailed as in the paper. Is it because CUDA reconstruction results are not as good as CPU? Is the reconstruction result of ceres smoother and finer? Thank you!

Wangyouai commented 3 years ago

@mihaibujanca The strange thing is that when I ran your branch ceres test fixed, I found that it runs faster (13s per frame) and you can see the process of real-time reconstruction from rough to smooth. Can you help me out, thank you very much!