kzampog / cilantro

A lean C++ library for working with point cloud data
MIT License
1.01k stars 206 forks source link

Different results depending on machine #38

Open Hugo-Pereira opened 4 years ago

Hugo-Pereira commented 4 years ago

Hi,

I am running the same example on two different machines (different hardware, same OS - ubuntu 18.04). I have ENABLE_NON_DETERMINISTIC_PARALLELISM=OFF. The results I get from each machine is slightly different. Is this expected behaviour?

Thanks

kzampog commented 4 years ago

Hi,

The flag only toggles behavior in cases that would be affected by the non-associativity of floating point arithmetic. If the results are consistent for a given machine, differences may be due to different hardware architectures. Which example is generating non-deterministic results?

Hugo-Pereira commented 4 years ago

I have been running some tests, and it appears the results start differing on the output of cilantro::RGBDImagesToPointsNormalsColors. The last 6 points and colors have different values The "example" I am talking about is from scans I am performing with an iPhone, not the ones on the repo, sorry.

This is the comparison between points on the cloud generated on my dev machine and on my CI machine. image

I just confirmed the inputs I pass to RGBDImagesToPointsNormalsColors are exactly the same.

Hugo-Pereira commented 4 years ago

If the results are consistent for a given machine, differences may be due to different hardware architectures.

Is this intended? I really need for the results to be the same across different hardware :\

kzampog commented 4 years ago

This is interesting. I would not be surprised by differences in the order of machine epsilon, but that does not seem to be the case here (e.g. 5th point). If input images are exactly the same, maybe you are using a custom depth converter that behaves non-deterministically? I can't think of anything else right now; all the function itself does is simple operations. If you could share a minimal example (code and data) that reproduces the problem, that would really help!

Algomorph commented 4 years ago

@Hugo-Pereira , I suggest a couple of things to try (some of this you may have tried already).

    • [ ] Perhaps check to make sure the versions of Eigen are the same on your CI and your dev machine
    • [ ] If they are the same but new(ish), try an older release of Eigen on both
    • [ ] Analyze the compiler stack differences between your two machines -- try to get those two to converge to see if the problem stems from having slightly different environments (CMake version, gcc version)
    • [ ] See if the results are different for a specific examples but the same for most, or are they different from most (may help to isolate the problem)
    • [ ] Make triple-sure that the CI environment is reading inputs from the same paths, i.e. it's actually reading the same files / file versions
Hugo-Pereira commented 4 years ago

I am using TruncatedDepthValueConverter. The inputs are the same, and I am using docker so all libraries and dependencies are (should be? :) ) the same. I am using Eigen 3.3.4.

I'll assemble a sample project and send it over as soon as I am able.

Hugo-Pereira commented 4 years ago

Oops, I was reading the wrong memory addresses :\ Sorry for the confusion The points of the point cloud match 100%. I am getting different results on the normals though, which by itself will cause the alignment to not match.

image

Is there a way around this? To force the results to be the same between machines

kzampog commented 4 years ago

Oh OK!

Regarding the normal computation, are you using single or double precision floats? I think differences look normal (no pun intended) for single precision. Are you using the NormalEstimation class or the image conversion utility?

Hugo-Pereira commented 4 years ago

Single precision, I am using RGBDImagesToPointsNormalsColors. I get the exact same results for the points and colors, but not for normals. Compiled cilantro with -DENABLE_NATIVE_BUILD_OPTIMIZATIONS=OFF and -DENABLE_NON_DETERMINISTIC_PARALLELISM=OFF

kzampog commented 4 years ago

That function computes a cross product and normalizes it for each normal vector. Eigen's cross product looks innocent, but it seems that the sqrt implementation used by .normalized() uses platform-dependent intrinsics. You could try manually normalizing instead using std::sqrt, although I'm not sure what guarantees that comes with!

Edit: This might also be worth checking: https://eigen.tuxfamily.org/dox/TopicPreprocessorDirectives.html It appears EIGEN_FAST_MATH is defined by default!

Hugo-Pereira commented 4 years ago

The thing is I am using Eigen for other stuff (like triangle mesh deformation, normals, etc), and the results are consistent between my dev machine and CI. I'll look further into it