version bump + offset capabilities

vistuleB commented 6 years ago

Problems

This pull request contains a number of breaking changes to the svgpathtools module that would constitute a "major version bump". (I guess.) In order to argue why these changes are good I will start by outlining some limitations in the current structure of svgpathtools.

(By the way I'm sorry if the following seems somewhat laconic but I lost my first finished draft describing the pull request, and right now I am just tired and fed up.)

1. mishandling of paths containing more than one 'Z'

Consider the following path:

M 0,0 0,1 1,1 1,0 Z M 2,0 2,1 3,1 3,0 Z

This path consists of two closed subpaths, each subpath being a unit square. Currently, svgpathtools parses the first 'Z' by adding a segment to geometrically close that square, then forgets the middle 'Z' ever existed; after parsing, svgpathtools will reprint the same path like so (maybe without the final 'Z', depending on options, but in any case without the middle Z):

M 0,0 0,1 1,1 1,0 0,0 M 2,0 2,1 3,1 3,0 2,0 Z

The second path may seem equivalent to the first, but it is not. The first square in the second path is a "topologically open square": its first and final segments do not meet with a line join; an SVG editor will show two separate path nodes to edit at the point (0,0), but only node path node at (0,0) for the first path.

This generalizes to all paths containing middle Z's: svgpathtools forgets about these Z's, losing important semantic information about the path that even results in incorrect stroke rendering.

2. mishandling of paths containing more than one 'M'

In this case, consider the following path:

M 0,0 1,0 M 1,0 1,1

This path consists of two distinct subpaths (more on this in a bit), but because of svgpathtools internal representation of paths, it forgets about the middle 'M' and remembers/renders the path as

M 0,0 1,0 1,1

which is not the same: the first path has no line join at 1,0 whereas the second path does; an editor will provide two different control nodes for the first path at 1,0 but only one for the second path; and issuing a "break apart" command on the first path in an editor results in the creation of two separate paths (one for each segment), whereas issuing a "break apart" command on the second path has no effect.

So here too important semantic information is lost that results in incorrect stroke rendering.

A note on paths with multiple subpaths

Both of the above issues concern paths that have multiple subpaths, also known as compound paths. (As per the spec.) (A new SVG "subpath" is defined as commencing after any 'Z' command or right before any 'M' command.)

One might think that paths containing multiple subpaths are of somewhat esoteric interest, but in fact they are the main mechanism for producing doughnut-hole effects in shapes (e.g., the letter "o") via even-odd filling, among others. Hence, I would argue compound paths are really a mainstream device in SVG.

Aborted Fixes

When I first realized these problems I thought of backwards-compatible fixes along the lines of either:

1) adding '.z' and '.m' boolean fields to the individual segments, so that parsed Z's and M's can be reproduced at print time

2) adding 'Z' and 'M' tokens interleaved among the segments of a Path (i.e., have the Path class maintain an ordered list that contains not just segments, but additional "token types" of type 'Z' and 'M')

A little thought reveals that both these solutions have serious (maybe insurmountable) drawbacks.

In 1), the segment "primitive" is made to carry information that is outside its domain of concern. This naturally leads to many unpleasant issues, such as: Should the containing Path maintain final "closing" line segments of subpaths? If so, does the final closing segment carry the 'Z' field, or does the next-to-last segment that originally "issued" the Z command carry the "Z" field? If the user untoggles a 'Z' field on a segment, is the containing Path notified? If not, should the Path re-check this information every time at runtime for relevant computations such as length, intersect, etc? And if it is notified, what is the mechanism for knowing the various Path owners of a given segment? And if the interface for toggling a 'Z' property lies with the Path and not with the segment, how to prevent a segment from being used in several different paths? Should this really be forbidden? Etc.

Clearly, segments should not have to worry about 'Z' and 'M' properties---these are "inter-segment" properties that the container, not the primitive, should be coordinating, lest issues such as the above crop up.

For 2), one is faced with minor uglinesses, such as maintaining two different iterators for the Path object (one for the segments only, one for the segments and the Z/M tokens), along potentially with two different types of indices for pointing into these lists. Insertions by index made into the list of segments become potentially ambiguous (how does the inserted segment fit into the "all tokens" list?), etc.

A more systemic problem with 2), however---and which is also a problem with 1), in fact---is that this solution doesn't offer good subpath-level support. For example, say we want the end and start of the subpath containing such-and-such a segment; are we forced to use methods such as Path.end_of_subpath_containing(segment)? Or, how do we iterate through subpaths in a Path? Via a method such as Path.subpath_iterator()? What does such a method even return, the first segment of the subpath? The first and last segments? Or rather their indices? What if we want to compute the length of this subpath? Do we need to create a fresh Path object containing only this subpath, and then compute its length? Or should we be nice and install a method such as Path.length_of_subpath_containing(segment) into Path? Similarly, what do we do if we want to compute the intersections between two subpaths in a path? Do we need to first cast them to Paths(), or should we be nice and provide a custom method?

The point is: The more "nice" we are to the user by providing subpath-specific methods inside of Path, the more Path will become bloated by virtue of doing double duty as a segment container and as a subpath container. I.e., we seemingly have to choose between having an overstuffed Path object on the one hand, and, on the other hand, making the user cast every subpath to a Path object anytime he/she wants to do anything moderately interesting with a subpath.

Actual Fix

Of course there is a trivial "correct" fix to these design issues, which is to create a new Subpath object, and to have Path be a container of Subpaths, while a Subpath is an ordered list of contiguous segments. This is the approach taken in this pull request, but it entails breaking/rewriting many aspects of the Path API, as I will proceed to explain.

Actually, when I started out refactoring along these lines I had some hopes of keeping the Path object backwards compatible by still having for x in path be an iterator over the segments in path, by still offering a constructor that would take a simple list of segments (as opposed to list of subpaths), etc.

The problem with this approach is (a) it's pretty ugly, (b) Path is once again doing double duty as both a Segment and Subpath container, creating confusion and wasting code---we have gone to the trouble of achieving separation of concerns (subpaths worry about segments, paths worry about subpaths) but we are not reaping the benefits of this separation. So I ultimately opted for a "clean beautiful refactor + scorched earth" approach, whereby Path purely behaves (in terms of constructors and iterators, e.g.) as a Subpath container.

This new class structure addresses all the above problems and makes everything clean and easy to grok. Some new potential concerns arise though (besides the broken API), addressed next.

Potential concerns

1. Keeping subpaths continuous

One concern is: how to make sure subpaths really consist of contiguous segments? At construction / segment insertion time, a Subpath can check the continuity requirements, but what if the user edits a segment later?

The answer is simply to make segments immutable, which is harmless. (Up to the fact that it might break existing code, haha.) Specifically, the .start, .end, .control1, .radius, etc, fields of a segment are now internally implemented as ._start, ._end, ._control1, ._radius, etc, which are made read-only via property getters. A new method called tweaked has been added to all segments (sample usage: segment.tweaked(start=some_new_start_value, control1=some_new_control1_value)) to allow for easy modification-by-cloning of segments. Moreover, power users can still go ahead and directly edit the ._ fields if they want, following python's "we're all grown ups" philosophy.

2. Too many parameters

Note that we now have three basic levels of objects: Segments (the newly minted parent class of Line, QuadraticBezier, CubicBezier and Arc, more on this below), Subpaths and Paths. Each is a container of the previous. And as the containee is always unaware of the container (which could be zero or many, e.g., a subpath could appear in zero, one or more than one path), each type of object comes with its own parameterization.

In the new source code, the time parameter of a segment is denoted t, the time parameter of a subpath is denoted T, and finally the time parameter of a path is denoted W, where each variable takes values between 0 and 1.

In order to ease the burden of handling all these different parameters and converting between them, a new Address class has been created. An address holds five fields (some of which may be None, depending on the circumstances): .t, .T, .W, .subpath_index and .segment_index.

User-facing methods (such as .intersect) that used to return complicated sets of indices to encode positions on segments/subpaths/paths now simply return an address (or a pair of addresses, in the case of intersect). Moreover, any place a local parameter is expected (such as t for a segment, T for a subpath, etc), an address that contains a non-None value of the parameter will be silently accepted in place of the original parameter. I have found that with these features in place, one hardly ever needs to worry about the presence of different parameters, even when coding new internal features.

(Basically: everything works great and flawlessly/brainlessly with addresses. The code is even easier to work with than before, since we don't have to worry about which element of a returned tuple denotes T and which denotes t, etc.)

More details on addresses can be found in the 'Main classes' README of path.py, starting on line ~817.

Other New Features

I originally set out to add "native" support for path offsetting and strokes in svgpathtools. This has been achieved. Both Path and Subpath now offer simple to use .offset and .stroke methods. These support all the standard SVG line joins ('miter', 'round' and 'bevel') and, in the case of strokes, all the standard SVG line caps (i.e. 'square', 'butt' and 'round').

(Note that it is particularly important, when stroking a subpath, to know if it is topologically closed or just "geometrically closed". Whence my eventual refactoring of the whole library, haha.)

These methods are robust to cusps, large offsets, etc. Accuracy is (optionally) controlled by the user. (Note that the offset of a Bezier segment is not in general a Bezier segment, so any offset of any nonlinear curve is always an approximation.) Offsets are pieced together from the "theoretically correct" Bezier curves (same endpoints as the true offset curve, same endpoint derivatives as the true offset curves) and hence take up much less space than, e.g., a piecewise linear approach. The subdivision depth of the curves is adaptively controlled, i.e., the density of subdivisions varies as needed along a path / subpath / segment, which also saves space. Some basic demos can be found in jpsteinb_offset_demo.svg, including the original offset paths from the current svgpathtools README.

In so doing, my method of computing offsets required converting elliptical arcs to cubic bezier paths, so support for such conversions has also been implemented. In fact, two different arc-to-bezier conversions have been implemented, one following Maisonobe's curvature-based approach (derivative and curvature respected at the ends of the curve bezier curve), one following the "midpoint rule" that seems to be more popuplar online (derivatives respected at the endpoints of the curve, and middle point of the bezier curve fixed to the middle point of the elliptical arc). One can call Path.converted_to_bezier() or Subpath.converted_to_bezier() or Arc.converted_to_bezier_subpath() to convert a path/subpath/arc using the standard midpoint rule, or use the same methods with Maisonobe=True to use Maisonobe's rule instead. (All such methods also accept quality and safety parameters, that I shall not describe here, but they are described in the doc strings.)

The area of a closed curve containing Arc segments can now be computed (and is now automatically computed) by converting the Arc segments to bezier segments. In this case, the default conversion used is Maisonobe's, because it seems to give quicker convergence to the area. As above, the user can control the accuracy of the conversion with optional safety and quality parameters.

Other Other New Features

Another major change (at least, major cosmetic change) is the insertion of some new intermediate classes to help clean up code that was repeated in many places. The new classes are BezierSegment, Segment, ContinuousCurve and Curve. I think these names may be self-explanatory, but if in doubt, they are documented in the README that starts at line 817 of path.py.

In particular, instead of calling the clunky

is_path_segment(thing) is_bezier_segment(thing)

to test the type of thing, the user can now use

isinstance(thing, Segment) isinstance(thing, BezierSegment)

for the same purpose, which seems more elegant and idiomatic.

Misc:

A bug relating to double-counting of intersections in path_contains_pt has been removed.
An unnecessary call to radialrange was removed from crop_bezier (in fact, t1_adj can simply be computed as (t1 - t0)/(1 - t0) (per theory and experimental check both)
There is now a multisplit method for segments/subpaths/paths on top of the old split method
Paths, Subpaths and Segments will "pretty print" for better readability
Another bug was removed from bezier_unit_tangent, that was using csqrt without regards to figuring out the correct branch of the root; as finding the correct branch didn't seem obvious, I replaced the limit-based approach by a more direct control-point-based approach, that should theoretically be at least as complete (and certainly faster)

More generally, as I was in the process of breaking everybody's code anyway, I tried to make the best of it and to give the library a clean start where I could. For a small example, the standalone path_contains_pt has been moved to Subpath and Path methods named even_odd_encloses(), which seemed more elegant to me. (When in doubt I left things alone, though.)

I also tried to keep things generally modular and unopionated. For a small example of this, paths and subpaths may be empty, and a path will not automatically throw away empty subpaths in its list.

Possible further steps

I did some cursory testing but I did not look at and/or rewrite any of the existing tests for the new API. (See test_offset.py for my own meagre set of "tests".) If this pull request is accepted, I think that one of the first (and most productive things) one could do is to rewrite the existing tests for the new API, while at the same time debugging the new material. (I have basically only done 7 days of on and off debugging.)

Thanks for reading. I hope people will enjoy using my modifications.

mathandy commented 6 years ago

Hey @jpsteinb, wow that's a lot to read. Looking through the diffs, it looks like

There are a lot of stylistic/whitespace changes, some of which bring PEP 8 compliant code into non-compliance. It's no big deal if new code isn't PEP 8 compliant, but why change my old code to fit your style?
Looks like you didn't pull recent changes before submitting your pull-request. Accepting this would bring back old bugs -- maybe this one's my fault for not pointing out this could happen in the instructions I gave you. Currently, this pull-request is hard to even look at as half the changes are related to 1 or 2. Also, please keep in mind, I'm in grad school currently and can't devote much time to svgpathtools these days -- you'll have to be patient if you want to work with me.

On a totally unrelated note, I looked you up -- we know some of the same people, e.g. Greg Kuperberg.

vistuleB commented 5 years ago

Hi Andy,

Yes I guess I didn't update the repo before making all those changes.

Do you have some instructions for how I (semi-) efficiently do that, at this point? You seem to be able to look at some kind of diff thing... can you teach me? (It will be easier for me to distinguish diffs that are related to my new code, and those that are not, than for you.)

As for the PEP thing, yes, I've modified my own no-longer-PEP8 linter to allow such syntax as

ABSALOM   = 32
GOLGOMETH = 64
GOMORRAH  = 128

without complaints. Sorry about that, the places where I changed things should be quick to change back, though, relative to all other stuff.

Thanks!

mathandy commented 5 years ago

Sure thing -- at the top of this page, right below "jpsteinb wants to merge 13 commits into mathandy:master from jpsteinb:master" you should see a button/tab labeled "Files changed".
That's what I was looking at when I said "... looking through the diffs ...".

mathandy / svgpathtools