Unboxed polygons. - Githubissues

lemmih commented 3 years ago

The current representation of polygons as boxed vectors of points with extra data is very inefficient. A SimplePolygon () (Point 2 Double) requires 11 words of memory per element. That's 11*8=88 bytes per vertex. For such a polygon, storing the vertices as two unboxed arrays would reduce the memory requirements to 2 words per vertex. That's an improvement of 5.5x.

Matters get worse when Rational is used instead of Double. Looking at SimplePolygon () (Point 2 Rational), it requires 61 words per vertex. Assuming no sharing, a polygon with 1000 vertices would require ~0.5MiB of memory. If we stored Rationals as vectors of numerators and denominators, the memory overhead is reduced to 8 words per vertex; an improvement of ~7.5x.

Ok, so boxing is definitely a problem but what can we do about it? One option is to use unboxed arrays and require that all points and associated data must be an instance of Data.Vector.Unboxed.Unbox. Even if we wrote an instance for Unbox Rational, this would be painful. It would put a large burden on the user to only use unboxable numbers and associated data.

There's a second option that offers a middle ground. It allows us to mix unboxed and boxed types, picking optimized representations when possible and defaulting to boxed vectors when there are no better options. The idea is very similar to how VectorFamilyF picks an optimal representation for d < 5 and then defaults to vector. Going down this path would add a constraint (eg Unpack r =>) to fromPoints and to the PointFunctor class. We would also have to delete the Functor, Bifunctor, Bifoldable, and Bitraversable instances. We'll still export the functions used by those instances but they'll have the constraint Unpack r.

You might ask, how is the Unpack r constraint any better than the Unbox r? Well, Unpack r is defined for all types. It'll use an unboxed representation for types like Double, Rational, (a,b), Point 2 r, etc. And it'll use a boxed representation for anything else like a -> b, [a], etc. The user will never have to write an instance of Unpack. In fact, it's not possible to add an instance to the type-class.

So, to recap: Boxed polygons use between 5 to 7 times more memory than unboxed polygons. We can force the use of unboxed types via Data.Vector.Unboxed. This prevents users from using boxed types in polygons. My proposed solution: Use a closed-world type family to select unboxed vectors for primitive types and gracefully default to boxed vectors for all other types.

Benefits:

Dramatically lower memory overhead.
Runtimes of all algorithms should improve by a constant factor.
Data.Ext adds zero memory overhead.
Since points and meta data are stored in separate vectors, they can be modified separately. For example, dropMetadata :: Polygon t p r -> Polygon t () r will become O(1). Similarly, numberVertices won't have to read&write all the vertices, it just adds a new vector with vertex numbers.

Drawbacks:

Adds Unpack constraints to a lot of functions (but not all functions).
Functor, Bifunctor, Bifoldable, and Bitraversable will no longer be available.
Functions like convexPolygon will no longer benefit from sharing points.

What are your thoughts, @noinia? I'll write up a draft PR so we'll have something concrete to look at.

lemmih commented 3 years ago

My measurement for (Point 2 Rational :+ ()) was off. It uses 21 words. The optimal representation uses 12 words.

lemmih commented 3 years ago

Will need benchmarks to see if it is worth doing or not.

noinia commented 3 years ago

Hmm, interesting idea. Reducing space usage, and therefore presumably also computation time, is a worthwhile goal. Some concerns/remarks though:

Isn't it impossible to write an Unbox instance for Rational? I.e. rational uses 'Integer' to represent its coordinates. Since Integers may have arbitrary size they cannot be unboxed either.
while our current preferred type for most things is Rational, I think we will want/(need) more advanced numeric types not too far in the future, so we shouldn't focus too much on Rational. (In the SoS stuff I already use some "weird" symbolic data type that (also) stores sets of integers, and if we want something like the shortest path in a polygon with holes algorithm we will need a type that supports square roots exactly).
I experimented a bit with different a different implementation of Ext that would treat 'c :+ ()' as a newtype around c. I haven't found a way yet to do that transparently though :(. Benchmarks do show that that would give us a perf. gain.
I wonder if we can use backpack to avoid the 'Unbox/Unpack' constraint.

noinia / hgeometry

Unboxed polygons. #88