I refactored arim.im.Rays and a part of arim.im.FermatSolver. When computing rays, the rays are now assembled more efficiently thanks to the new function Rays.expand_rays which replaces assemble_rays. This new function was written in C++ and wrapped with Cython; so there is one numba function less in arim (a small step towards getting rid of numba).
This PR does not change the way the FermatSolver is called so the example script multiview_tfm.py is untouched.
Performance wise I did a quick benchmark using the following interfaces:
Computing the 21 views takes on my machine 38.2 s with the previous version and 34.8 s with the new. That's an improvement of 8.9%. The bottleneck of FermatSolver.solve() is still arim.im.find_minimal_times which takes 75% of the runtime; this could be improved some day with a more memory efficient implementation.
There is a regression: the memory footprint is bigger because the indices are always stored as uint32 whereas they were stored before in uint8/uint16 if possible. I didn't bother implementing this optimisation yet.
@rltb: do you have time for reviewing this PR? If yes, please send me any suggestion or comment. Thanks
I refactored
arim.im.Rays
and a part ofarim.im.FermatSolver
. When computing rays, the rays are now assembled more efficiently thanks to the new functionRays.expand_rays
which replacesassemble_rays
. This new function was written in C++ and wrapped with Cython; so there is one numba function less in arim (a small step towards getting rid of numba).This PR does not change the way the FermatSolver is called so the example script
multiview_tfm.py
is untouched.Performance wise I did a quick benchmark using the following interfaces:
Computing the 21 views takes on my machine 38.2 s with the previous version and 34.8 s with the new. That's an improvement of 8.9%. The bottleneck of
FermatSolver.solve()
is stillarim.im.find_minimal_times
which takes 75% of the runtime; this could be improved some day with a more memory efficient implementation.There is a regression: the memory footprint is bigger because the indices are always stored as uint32 whereas they were stored before in uint8/uint16 if possible. I didn't bother implementing this optimisation yet.
@rltb: do you have time for reviewing this PR? If yes, please send me any suggestion or comment. Thanks