urschrei / simplification

Very fast Python line simplification using either the RDP or Visvalingam-Whyatt algorithm implemented in Rust
Other
168 stars 18 forks source link

simplification using a %-like value #11

Open mattijn opened 4 years ago

mattijn commented 4 years ago

Thanks for your work on these simplification algorithms!

Line simplification can be done using the Douglas-Pecker or Visvalingam-Whyatt algorithm. They both operate through an epsilon/tolerance parameter. Depending on the algorithm it represent distance (DP) or area (VW). Therefor the required tolerance in each situation is hard to decide since it is implicitly depending on the projection as well (eg. meters or degrees).

Mapshaper provide the option to reduce the linestring by a percentage of points. It would be nice to have this possibility as well for these algorithms.

(you probably also like this blog: http://martinfleischmann.net/line-simplification-algorithms/ which incorporate your package).

mariuswallraff commented 5 months ago

I second this; for our use case of simplifying time series data down to a given number of points, this would be an extremely useful option to have (either as percentage or absolute number of points). Currently we execute simplify_coords_vw_idx consecutively with increasing tolerance, and either do very many steps, or end up oversimplifying to a large degree.

I had a quick glance at the underlying code and believe it might be possible to "just" add an additional break criterion based on the length of the remaining heap, but to be fair I have no idea about Rust.

If it helps, I wouldn't mind if the implementation is approximate in either number of points or order of the removed points.

Either way, thank you for your work! :)

mayuri-kulkarni commented 4 months ago

I too have a similar requirement!

mariuswallraff commented 4 months ago

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

urschrei commented 4 months ago

For those landing here for a similar reason, it might interest you that while this feature would still be very useful, in our use case, running simplify_coords_vw_idx consecutively with increasing tolerance (and applying the index to filter out unneeded data points before the next call) is actually faster than calling the function just once with the final tolerance.

I assume this is because finding the triangles that need to be recomputed in the stack becomes easier when the stack is smaller, but that's just a guess.

That's very interesting. I'll have to dig into it in geo and see whether we can verify that guess. Do you have some example parameters?

mariuswallraff commented 4 months ago

Sure, here's a small example, although it's not as good as with our real-life data; below I attached a slide illustrating my previous results.

from timeit import Timer

import numpy as np
from simplification.cutil import simplify_coords_vw_idx

np.random.seed(1337)

a = np.ascontiguousarray(
    np.vstack(
        [
            np.arange(10000),
            np.concatenate([np.random.normal(0.0, 0.1, 5000), np.random.normal(1.0, 0.1, 5000)]),
        ]
    ).T
)

# target: less than or equal 100 points
max_points = 100

epsilon = 20.48
out = simplify_coords_vw_idx(a, epsilon)  # => 91 points

print(Timer("simplify_coords_vw_idx(a, epsilon)", globals=locals()).repeat(3, number=1000))
# [4.4360130999994, 4.323799300007522, 4.425885300006485]

def iteratively(input_array, epsilon, max_points):
    selected_idx = np.arange(0, len(input_array))
    while True:
        current_idx = simplify_coords_vw_idx(input_array, epsilon)
        selected_idx = selected_idx[current_idx]

        if not max_points or len(selected_idx) <= max_points:
            break

        input_array = input_array[current_idx]
        epsilon *= 2.0
    return selected_idx

print(Timer("iteratively(a, 20.48, max_points)", globals=locals()).repeat(3, number=1000))
# [4.509853900002781, 4.511391199979698, 4.359233400027733]
print(Timer("iteratively(a, 10.24, max_points)", globals=locals()).repeat(3, number=1000))
# [4.469542200007709, 4.420092500018654, 4.369932599976892]
print(Timer("iteratively(a, 5.12, max_points)", globals=locals()).repeat(3, number=1000))
# [4.414737000013702, 4.3983672999893315, 4.3968801999872085]
print(Timer("iteratively(a, 2.56, max_points)", globals=locals()).repeat(3, number=1000))
# [4.21051350000198, 4.002777400019113, 3.9561154000111856]
print(Timer("iteratively(a, 1.28, max_points)", globals=locals()).repeat(3, number=1000))
# [3.8792780000076164, 3.972585600015009, 4.081758899992565]
print(Timer("iteratively(a, 0.64, max_points)", globals=locals()).repeat(3, number=1000))
# [3.955509699997492, 3.804218100005528, 3.783603399991989]
print(Timer("iteratively(a, 0.32, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7186095000070054, 4.0281457999954, 3.7554183999891393]
print(Timer("iteratively(a, 0.16, max_points)", globals=locals()).repeat(3, number=1000))
# [3.755888199986657, 3.7749703000008594, 3.7538082000100985]
print(Timer("iteratively(a, 0.08, max_points)", globals=locals()).repeat(3, number=1000))
# [3.7066072000015993, 3.6726426999957766, 3.708958800008986]
print(Timer("iteratively(a, 0.04, max_points)", globals=locals()).repeat(3, number=1000))
# [3.9454101999872364, 3.809387200017227, 3.8627665000094566]

print(len(out))
print(len(iteratively(a, 0.04, max_points)))
print()

image