Open ArcEye opened 6 years ago
Comment by RunningLight Mon May 11 17:56:03 2015
Regarding neon_mathfun_test, see https://gist.github.com/RunningLight/91246d5d1cc224cd008f
gcc 4.6.3 didn't like the ffast-math
flag so I left it off.
Comment by machinekoder Mon May 11 18:48:37 2015
Wow, thats a factor of 10 compared to the gcc math functions. Definitely worth trying.
Comment by RunningLight Mon May 11 22:23:17 2015
I agree. At the same time, I'm aware that the trajectory planner uses the posemath library and I haven't looked yet to see if there's any problem related to it.
Comment by mhaberler Tue May 12 04:05:59 2015
well since there is some evidence as to the location of the delay hike and it is just cos(), inspecting cos() for spikes would be my first priority; by analogy, the other transcendentals as far as they are used in the code base warrant a look as well
taking a step back, I guess issues like this one are likely to reappear, so the question arises - what is the best strategy to deal with libm?
NB linuxcnc math support has seen some work which might be useful to adopt
Comment by machinekoder Tue May 12 07:43:34 2015
Comment by mhaberler Tue May 12 07:51:53 2015
yes, forgot that option (in fact for kthreads flavors we do already have our own math library, so it would make sense to inspect the header magic which makes that happen and build on it)
now.. an evidence-based approach to this issue would be to write up pre-configure-time tests which automatically select the best option for a platform
this test would not be part of configure since this makes no sense when cross-building, but rather a test program which suggests the best possible configure options for a given platform - one would have to run that on target
this could well be a separate repo/project
Comment by machinekoder Wed May 13 20:35:30 2015
@RunningLight Tried on my BBB and got slightly better results. Using the latest gcc on Debian Wheezy. The cortex-a9 option has to be replaced by cortex-a8 for the BBB.
The cos_ps functions are calculating 4 values at the same time. So to make proper use of the full advantages one would need to modifiy the code accordingly. A SSE/x86 version is also available. However, the cephes functions also perform slightly better than the glib functions but they would require no additional work to be used. I tested them with the TP and the RT delay did not happen anymore. @mhaberler Maybe we should choose option 3?
Comment by RunningLight Wed May 13 21:01:54 2015
@strahlex I should have said I changed the flag.
I haven't taken on rtapi_get_clocks() yet. Too much end-of-school activity with grandkids:)
However, after some rather confusing results with a test styled on the quick test you posted, which iterated over a large domain (-100., +100) of input values, I cobbled up a test where, for a single fixed value
, I count the number of times cos(value)
can be computed between those 10000us 'ticks' of clock()
. I don't feel confident enough in the weird outcome to share yet. For a start, I gather 10000 such counts. Roughly, the difference between the maximum and minimum counts on the i5 is about 4:1; on the BBB, 1000:1. In the case of the BBB, the lowest number of counts corresponds to cos()
costing on the order of 1ms. I want to check my work before I post plots of the distributions.
All that aside, I think it very reasonable to choose option 3 rather than deal with the glibc functions.
Comment by mhaberler Tue May 19 12:19:21 2015
fyi @strahlex and me are planning to link in math functions as needed into the userland rtapi RT module (rtapi_main.c ff), and likely into ulapi library as well so both sides use the same math library
Comment by machinekoder Tue May 26 07:53:44 2015
The rtapi_math patch is here https://github.com/machinekit/machinekit/pull/652
Issue by machinekoder Sun May 10 17:11:28 2015 Originally opened as https://github.com/machinekit/machinekit/issues/629
Found some additional compiler options and also specific math functions for the ARM Neon fpu. It may not improve performance dramatically, but it does not cost much either: http://www.eliteraspberries.com/blog/2013/09/cflags-for-numerical-computing-on-the-beaglebone-black.html
and if https://github.com/machinekit/machinekit/issues/412 is related to a bug in the GCC math functions this may help too: http://gruntthepeon.free.fr/ssemath/neon_mathfun.html