noinia / hgeometry

HGeometry is a library for computing with geometric objects in Haskell. It defines basic geometric types and primitives, and it implements some geometric data structures and algorithms.
121 stars 40 forks source link

Unboxed polygons. #88

Open lemmih opened 3 years ago

lemmih commented 3 years ago

The current representation of polygons as boxed vectors of points with extra data is very inefficient. A SimplePolygon () (Point 2 Double) requires 11 words of memory per element. That's 11*8=88 bytes per vertex. For such a polygon, storing the vertices as two unboxed arrays would reduce the memory requirements to 2 words per vertex. That's an improvement of 5.5x.

Matters get worse when Rational is used instead of Double. Looking at SimplePolygon () (Point 2 Rational), it requires 61 words per vertex. Assuming no sharing, a polygon with 1000 vertices would require ~0.5MiB of memory. If we stored Rationals as vectors of numerators and denominators, the memory overhead is reduced to 8 words per vertex; an improvement of ~7.5x.

Ok, so boxing is definitely a problem but what can we do about it? One option is to use unboxed arrays and require that all points and associated data must be an instance of Data.Vector.Unboxed.Unbox. Even if we wrote an instance for Unbox Rational, this would be painful. It would put a large burden on the user to only use unboxable numbers and associated data.

There's a second option that offers a middle ground. It allows us to mix unboxed and boxed types, picking optimized representations when possible and defaulting to boxed vectors when there are no better options. The idea is very similar to how VectorFamilyF picks an optimal representation for d < 5 and then defaults to vector. Going down this path would add a constraint (eg Unpack r =>) to fromPoints and to the PointFunctor class. We would also have to delete the Functor, Bifunctor, Bifoldable, and Bitraversable instances. We'll still export the functions used by those instances but they'll have the constraint Unpack r.

You might ask, how is the Unpack r constraint any better than the Unbox r? Well, Unpack r is defined for all types. It'll use an unboxed representation for types like Double, Rational, (a,b), Point 2 r, etc. And it'll use a boxed representation for anything else like a -> b, [a], etc. The user will never have to write an instance of Unpack. In fact, it's not possible to add an instance to the type-class.

So, to recap: Boxed polygons use between 5 to 7 times more memory than unboxed polygons. We can force the use of unboxed types via Data.Vector.Unboxed. This prevents users from using boxed types in polygons. My proposed solution: Use a closed-world type family to select unboxed vectors for primitive types and gracefully default to boxed vectors for all other types.

Benefits:

Drawbacks:

What are your thoughts, @noinia? I'll write up a draft PR so we'll have something concrete to look at.

lemmih commented 3 years ago

My measurement for (Point 2 Rational :+ ()) was off. It uses 21 words. The optimal representation uses 12 words.

lemmih commented 3 years ago

Will need benchmarks to see if it is worth doing or not.

noinia commented 3 years ago

Hmm, interesting idea. Reducing space usage, and therefore presumably also computation time, is a worthwhile goal. Some concerns/remarks though: