Closed jobh closed 1 year ago
Hi @jobh .
It might come from the else
case here does not use numpy.array
How many data points do you have though ? Because this method is a O(n) complexity, which should not be that huge (even if we can do better).
Can you provide a reproductible case ?
Thanks
Sure!
$ pip install --upgrade haversine==2.5.1 && \
python -m timeit \
-s 'import numpy, haversine; a=numpy.zeros((1000,2))' \
'haversine.haversine_vector(a, a)'
[...]
Successfully installed haversine-2.5.1
10000 loops, best of 5: 26.3 usec per loop
If I change to 2.7.0, I get
[...]
Successfully installed haversine-2.7.0
500 loops, best of 5: 474 usec per loop
My typical input size is 100-1000 locations, where I want distance between neighbours: haversine_vector(a[1:], a[:-1])
.
In the code, both branches are slow since they do a function call per location. Vectorized versions might be something like the following (untested) which can be called single-shot instead of per-location. OTOH, a separate flag (...check=True)
which can be turned off may be an option, for those who know the data is good and need speed.
def _ensure_lat_lon_vector(lats, lons):
if numpy.any(numpy.abs(lats) > 90):
raise ValueError(f"Latitude is out of range [-90, 90]")
if numpy.any(numpy.abs(lons) > 180):
raise ValueError(f"Longitude is out of range [-180, 180]")
and
def _normalize_vector(lats, lons):
lats = (lats + 90) % 360 - 90
idx = lats > 90
lats[idx] = 180 - lats[idx]
lons[idx] += 180 # is this right?
lons = (lons + 180) % 360 - 180
return lats, lons
...and (to explain my motivation): I do this for thousands of tracks, to filter out those with too large distance between neighbouring locations. So that half millisecond quickly adds up to several seconds.
Following commit b949e32246976f14f192c97c9bb0b82b8e1ee451, which introduces element-wise limits check, the vectorized version is 10 times or more slower than it used to be.
There should probably be vectorized versions of the check, or a flag to indicate that limits check is not needed.