victorprad / InfiniTAM

A Framework for the Volumetric Integration of Depth Images
http://www.infinitam.org
Other
918 stars 351 forks source link

Question: source of randomness in SDF #90

Closed Algomorph closed 6 years ago

Algomorph commented 6 years ago

Hello, I am currently in the middle of implementing KillingFusion (fusion of dynamic scenes) in InfiniTAM v3 (Algomorph/InfiniTAM, branch feature/KillingFusion). I am close to achieving convergence for optimization, but I'm having some trouble because of apparent stochastic results in SDF generation.

The SDF scenes are generated using only existing routines. The input files I'm using are the same (openni files). Do you have any idea what could be causeing SDF to be slightly different every time (using ITMSceneReconstructionEngine_CPU class)?

Thanks!

victorprad commented 6 years ago

Hello.

There really should not be any randomness, especially in the CPU engines ... How different are the numbers? Could you please check if you get the same randomness on the teddy sequence?

Cheers,

olafkaehler commented 6 years ago

To avoid randomness, you should also disable OpenMP.

Algomorph commented 6 years ago

@olafkaehler , I think you are right, it might be OpenMP. @victorprad, there definitely is randomness, i.e. I've checked incoming frames pixel by pixel: they are always read the same from disk, but the tracker results are different. I will try disabling OpenMP and report back. Thank you both.

Algomorph commented 6 years ago

Gentlemen, with OpenMP disabled, I'm getting the same tracker results every time and the SDFs generated seem to be the same. This points to a data race when multi-threading is involved whose effects are (somewhat) benign in most cases (although making debugging what happens to specific individual voxels impossible). The natural questions that arise are: can this cause tracking loss over time? Can the error accumulate from this? Where exactly does this data race occur?

olafkaehler commented 6 years ago

It's not a data race. It's limited precision of floating point when storing intermediate results. Try taking the sum over 1M floating point numbers with and without OpenMP. Then try the same with double values.

PS: With atomics I'd assume the results to be more surprising than with parallel reduction, but still, order matters for limited precision operations.

Algomorph commented 6 years ago

Thanks, I'll look into that, and also to what extent the OpenMP reduction and atomic clauses help overcome this when I have the time. For now I will just disable OpenMP if I need to look at specific voxels for debugging.