Closed perazz closed 1 year ago
Related to this is the NumPy's function argsort
that returns an integer array that sorts the argument array. Here is a Fortran implementation: https://github.com/certik/fortran-utils/blob/master/src/sorting.f90#L179, although inefficient ($O(n^2)$ instead of $O(n \log(n))$).
That's right, I'm no compiler expert but I believe most compilers would do an internal sorting algorithm or use any sorting networks to implement minloc
, maxloc
, minval
, maxval
, because they're all $O(n \log(n))$ methods.
Similarly for sin
and cos
, they could be internally sped up by i.e. running $\sin x = \pm \sqrt{1-\cos^2(x)}$, not mentioning with architectural optimizations like in this project
For the item 2, just call sin and cos. Most compiler will optimize it to a single call to sincos function. See https://godbolt.org/z/T3zs9vM9E
@fazedo true. But also isn't it the case that the sin
and sincos
instructions are actually slower than vectorized polynomial fit by hand? That is my understanding at least in terms of clock cycle counts.
I'm not aware of any compiler that bothers to sort the array entries. You can determine minloc
, minval
, etc. just by scanning the array. That's just O(n), and you don't need to create any temporaries along the way.
@certik I don't know. I rely on intrinsic implementations.
I'm not aware of any compiler that bothers to sort the array entries. You can determine
minloc
,minval
, etc. just by scanning the array. That's just O(n), and you don't need to create any temporaries along the way.
You're right, that's the case of gfortran for example (see this godbolt link). That also shows my point: minloc
and maxloc
are two identical functions, that perform the same identical scan, for either the max or min value. Having a combined minmax
or similar intrinsic would be able to do the same in just 1 scan instead of two
@perazz just like with sin
and cos
, compilers seem to be able to identify this and "merge" them; I wonder if it is better left for compilers to identify minval
and maxval
and combine them as needed.
The argsort
example above is slightly different, I think we need it.
I'm often refrained from employing Fortran's array-searching and math intrinsics because they lead to redundant overhead in performance-critical applications. For example, imagine finding the locations of minimum and maximum value in an array:
Such are very common cases, which incur pretty large overhead compared to methods that produce both results in a single iteration. In this case, a quick sort would be enough:
I'm not sure if the better way to go would be to either standardize more flexible versions as opposed to having them implemented in a library, eager to know what the Fortranners out there think!
Here's some functions I'd like to see
1) Min/max bounds and locations, like
2) combined sin/cos functions