Closed abey79 closed 3 years ago
Some quick and dirty benchmarks:
from svgelements import *
import numpy as np
t10 = np.linspace(0, 1, 10)
t100 = np.linspace(0, 1, 100)
t1000 = np.linspace(0, 1, 1000)
arc = Arc(Point(13.152548373912, 38.873772319489), Point(14.324014604836, 24.436855715076), Point(-14.750000067599, 25.169681093411), Point(-43.558410063178, 28.706909065029), Point(-19.42967575562, -12.943218880396), 5.89788464227)
%timeit for t in t10: arc.point(t)
180 µs ± 4.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit for t in t100: arc.point(t)
2.1 ms ± 47.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit for t in t1000: arc.point(t)
21.3 ms ± 522 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit arc._point_numpy(t10)
38.7 µs ± 857 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit arc._point_numpy(t100)
40.2 µs ± 589 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit arc._point_numpy(t1000)
66.1 µs ± 295 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
q = QuadraticBezier(Point(123, 234), Point(923, 123), Point(543, 123))
%timeit for t in t10: q.point(t)
69.6 µs ± 1.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit for t in t100: q.point(t)
705 µs ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for t in t1000: q.point(t)
7.02 ms ± 97.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit q._point_numpy(t10)
20.2 µs ± 459 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit q._point_numpy(t100)
21.4 µs ± 914 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit q._point_numpy(t1000)
32.9 µs ± 492 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
c = CubicBezier(Point(234, 755), Point(123, 234), Point(923, 123), Point(543, 123))
%timeit for t in t10: c.point(t)
112 µs ± 958 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit for t in t100: c.point(t)
1.13 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit for t in t1000: c.point(t)
11.5 ms ± 80.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit c._point_numpy(t10)
34.8 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit c._point_numpy(t100)
35.5 µs ± 347 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit c._point_numpy(t1000)
57.5 µs ± 754 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I would rename LinearPathSegment to be Linear
while this doesn't seem to be a very useful change it makes this base class for Line and Close parallel the modern version in regbro/svg.path project. I have several times suggested changes to his project and made changes to mine to try to keep some interoperability. I didn't think it was worth de-duplicating but if you feel that it is paralleling the class name might be helpful going forward.
as_complex() should obviously be a __complex__
as a dunder method, I added it already in 1.2.9's current master branch, since it was clearly an oversight on my part.
The testing tended to already need scipy to pass. Since I tested the hypergeometric function for path length etc.
You'd edit the .travis.yml
to make sure it could pass with test code testing the soft requirements.
https://github.com/meerk40t/svgelements/blob/master/.travis.yml
Noted. Tomorrow I will change the class name and rebase to use complex dunder instead of as_complex().
Shapes don't strictly have a decomposition. They do because SVG 2.0 defined them. And I use their definitions pretty exclusively but saying something is .4 through a rectangle is a bit of a weird thing, but it would make sense to do this.
I would however not suggest using the code you have there. Except as maybe break-glass fallback code. As those are all kinda giant operations:
def point(self, pos):
p = Path(self)
p.reify()
return p.point(pos)
You create a Path out of the decomposed elements then you reified that Path and got the point then threw everything out. For fallback code this obviously works. But, it's going to be better to actually do the same sort of calculations done on the Path() segments on the Shape segments here, caching the same value there, then find the non-reified Point and multiply just that point by the .transform.
Though I see why in your usecase this would be quite nice. Namely you could just apply a one size fits all .point() to anything that is of a Shape type including the Path.
Really some of the Path code here might be better off moved into the Shape code. But, clearly you're right and all of these should do their own .point() functions for any shape.
This is likely past the intent of the present PR, but on way to go about it is to split Path/Shape class which would model the content of a SVG in all its diversity, and geometrical representation class (ArcGeom, PolyLineGeom, etc.) which would implement the geometry calculation and transformation. The Path/Shape class would each own a geometrical representation, to which they would delegate all geometrical computation. This way, Circle, Ellipse, and Arc would all use a ArcGeom representation.
This could also be a nice step towards a fuller Numpy optimisation as we could have two versions for the *Geom classes.
The real fundamental difference between Arc and Circle is that ultimately Arcs are PathSegments and Circles are Shapes. These are defined in two radically different sections of the SVG Spec. The even worse offender here is between Line
and SimpleLine
. Literally they are both lines. They define a line. Technically they should both be called Line
but I changed the stupid Shape
line to be called SimpleLine
. Also, regebro/svg.path#64 has a kind of elegant idea with the PathSegment of basing the Linear segments with Linear
and the the curves segments with Curve
. Which has a kind of nice parallelism to it. Also, it might give some opportunities to make solid fallback code depending on the type of segment it is.
Okay, I've done some review and some of these functions aren't really needed. The issue with point is that it suffers a type error when trying to print the point. Since the data type is unknown by Length.str()
, the code goes through a couple tests and then tries to cast it as a float but it can't do that so it crashes. But, the rest of the code works for this.
s = '%.12f' % (s)
Gets replaced with:
try:
s = '%.12f' % (s)
except TypeError:
return str(s)
When that happens we get Line/Close, CubicBezier, QuadraticBezier all working out of the gate. Since Point gets returned but the .x and .y are arrays. It does its job correctly and is basically just a class that stores a pair of values. Arcs will move differently depending on both variables. Unlike the other segments, you can't solve them independently in each dimension. This would absolutely require np.cos() and np.sin(), I can't a really reasonable way around this.
Now, I'm finding this np compatible code for the segments. Arc.point() would need a soft requirement. Specifically if it suffers a type error at position=0, and we'd send try sending it to _numpy_point() and likely move _numpy_point_at_t() inline with that bit of code.
I'm not seeing anything here for Path.point(), this is certainly the end goal right?
Now, when these functions return Point
classes with arrays for .x
and .y
these can be run through transform_point without issue.
>>> p = np.linspace(0,1,10)
>>> c = CubicBezier((0,0), (10,10), (20,-10), (30,0))
>>> q = c.point(p)
>>> q
Point([ 0. 3.33333333 6.66666667 10. 13.33333333 16.66666667
20. 23.33333333 26.66666667 30. ],[ 0. 2.30452675 2.88065844 2.22222222 0.82304527 -0.82304527
-2.22222222 -2.88065844 -2.30452675 0. ])
>>> Matrix("scale(5)").transform_point(q)
Point([ 0. 16.66666667 33.33333333 50. 66.66666667
83.33333333 100. 116.66666667 133.33333333 150. ],[ 0. 11.52263374 14.40329218 11.11111111 4.11522634
-4.11522634 -11.11111111 -14.40329218 -11.52263374 0. ])
The np code to do .point() on any shape would be to call .segments(transform=False)
. Apply the _calc_lengths() code or some equivalent (since this would be part of Shape
) Then do the np code to access any indexes fitting the criteria of being within those length parameters, scaling that up to [0,1] for that segment and replacing each chunk within with the given points.
So it looks like the correct path forward here is:
1) Correct Length.str()
so it stops objecting to Point() of .x and .y arrays.
2) Add in a soft requirement at Arc.point() to defer to the _point_numpy()
code on error.
3) Write .point() for all Shape
classes and defer to ._point_numpy()
code on objection.
4) The Shape._point_numpy() should utilize .segments()
. Then loop through all the segment lengths and fill in the export values in the ranges that match for that segment range in each of them.
Linear
or LinearPathSegment
since the code is the same, nor is there a need for the Curve
class that would complement it. This could be added anyway because svg.path
has such a thing, but it's not required for this task.Point
.Arc._point_at_t_numpy
should probably just be in-lined with Arc._point_numpy()
code. The reification failed to clear ._length
and .lengths
. Also, I thought you could get away with not reifying the shapes for a bit, but you really the shapes only work reified. You could have a square that is scaled non-uniformly and it would make evenly spaced points not evenly spaced if you applied the matrix afterwards. And .point() code is does not respect the transformation.
This should mean import the Arc._point_numpy() code, and I assume you either have or need numpy accelerated Shape.point() code. If not, I'll get to it tomorrow.
Most of the trying to use complex numbers isn't really needed. A 2xN array is fine for this stuff to move them around.
A quick couple of comments:
I'm not seeing anything here for Path.point(), this is certainly the end goal right?
Yeah I hadn't done that yet in order to discuss the best way to do it, but yeah the soft requirement should be added to .point()
Edit: I initially mis-interpreted your comment. So yeah I guess Path.point()
is still on the TODO list, although I'm unsure how to deal with embedded Move segment (ei compound paths).
Most of the trying to use complex numbers isn't really needed. A 2xN array is fine for this stuff to move them around.
That's true. As a matter of fact, if the order is good, you can get a Nx1 complex view of a Nx2 float array without incurring any copy. I've got a function for the opposite in my project:
def as_vector(a: np.ndarray):
"""Return a view of a complex line array that behaves as an Nx2 real array"""
return a.view(dtype=float).reshape(len(a), 2)
The real advantage of complex is that it often simplifies the syntax because you don't have to worry about NP's broadcasting rules, etc.
I'll merge and adjust the PR to use Point with NP arrays!
I did the following in 2f3724d2b9db918eba082ed694aabd1e13fd6771:
Point.__init__()
to accept numpy arrays for complex or Nx2 numpy arrays of float. Both result in two Nx1 numpy array of float in .x and .yPoint.__eq__()
to account for the possibility of numpy .x/.y. I've removed most of the boilerplate code, delegating back to Point.__init__()
. I'm hoping I haven't broken anything, a good review is certainly warranted. Point.as_complex()
LinearPathSegment
to _Linear
, to reflect the fact that there is little benefit to expose this class.point()
to return a vectorised Point()
for vector inputs. In the process I did some manual common sub-expression reduction that should have some impact on (large) vector input with minimal readability impact.Arc._point_at_t_numpy()
._implementation_match
tests, although their value decreased somewhat.Here is the remaining TODO list for this PR (feedback obviously welcome!):
Path.point()
.point()
function as wellI added a pull request on your pull request, this should add a Shape.point() and have the soft reference to np, when the position fails. The code there should work, but might be inefficient. Seems like there should have been a way to get the index values and I just rechecked the boolean.
There is some benefit to exposing Linear, if it's going to exist it should likely be exposed. Mostly the class would very clearly parallel the svg.path
classes ( https://github.com/regebro/svg.path/blob/5b80c0f98b7be637fc1d201b75d2ce99b2cfabc2/src/svg/path/path.py#L38 ). Which has some benefit itself. And would maybe be better even with a dummy Curve for all non-linear PathSegments.
In theory all the parts are there, it's down to making it pretty and in the case of Shape._point_numpy()
maybe needs a bit more TLC from somebody better at numpy than I currently am.
And I also think there's something to idea of having point()
actually allow lists even without np
. There'd be some considerable benefit to some of this stuff anyway. Since even without np
there's some major speedups that could certainly be done. It's a bit outside of the current question at hand but I could absolutely see a .points()
function that returns a PointList
type (https://www.w3.org/TR/2015/WD-SVG2-20150709/shapes.html#InterfaceSVGPointList) It's not completely outside the svg spec.
I merged your PR and the last changes on master. There was some conflicts I hope I didn't mess up, tests appear to pass though.
I renamed _Linear to Linear. Tomorrow morning I will give a good look at Shape.point() and run some testing.
And would maybe be better even with a dummy Curve for all non-linear PathSegments.
I can definitely do that too if you feel it'd be better.
And I also think there's something to idea of having point() actually allow lists even without np. There'd be some considerable benefit to some of this stuff anyway. Since even without np there's some major speedups that could certainly be done. It's a bit outside of the current question at hand but I could absolutely see a .points() function that returns a PointList type (https://www.w3.org/TR/2015/WD-SVG2-20150709/shapes.html#InterfaceSVGPointList) It's not completely outside the svg spec.
Although I see the point of this (pun not intended), I'm really wondering why someone interested in fast array operations would be reluctant to depend on NumPy. Given its incredible quality and widespread support, Numpy probably is the safest dependancy that you could possible add to a project. (If it wasn't for the constraints on the release process, I'm pretty sure it would be part of the standard library by now.)
Loosely related note: if you haven't seen it already, the official numpy paper published 2 months ago is a good read.
There's some philosophy of not including any dependencies for stand-alone libraries. The best libraries stand alone. If somebody was interested in fast array operations they wouldn't be reluctant to depend on NumPy. But, if somebody was concerned with just robustly and correctly parsing SVG they might be reluctant to to depend on NumPy for that.
The use of a PointList on would be on my end would allow a consistent methodology here for getting a bunch of results at once, and also to do things like underpin the _PolyShape
. class. I could likely rig it up so the soft requirements have less impact. There wouldn't be a hybrid array-like Point
class where x is either a float or large array of numbers. But, PointList would have xs being either a list or an array of numbers. The class itself could either say whether it's numpy array back or whether it's list backed, or simply emulate a numpy array for most of the operations performed. Mostly it gets away from kind of weird result of a Point object with more than one point. But, also could provide speed boosts to those who don't care to include NumPy or query a Points() routine with a proper array. Also, svgelements tends to want useful things and a pointlist is a kind of useful structure. I could also use to back the LineSegments since they are all shapes of point-lists. Which has the kind of useful effect of making them able to conceal their backend structure and use numpy seamlessly if I needed by simply having the numpy include create a different bit of hidden backing data structure behind the implementation if numpy exists. Though I still don't see any easy way around the Arc point() code without a weird sort of delegation.
Either way, you should be good enough along to have really robust svg rendering and really fast linear interpolations. Happy publishing.
I see your point. And yes, there is something to say about our current hybrid Point thing. Returning a PointList instead of a Point for a numpy input to .point()
(or, ideally, any sequence input) would be a bit more elegant.
Yeah, I'm thinking making a Pointlist
class that when you call points()
rather than point()
it basically does the np.array code to calculate the typical answer. But, pointlist would cheat and work a bit like an array with regard to a overloading those main routines. The arc code would need to cheat a little bit and would do a for-loop over the data if it's a list and not an np.array. But, the points() could would assume you used a Pointlist or a np.array.
Basically rather than a import fail alternative method routine, it would assume everything is a normalish np.array and do the normal code return the answer. Pointlist would just be the assumed np.arrayless method to call that code if you wanted to do that. This would basically give us a clean bit of code without wrapping the answer in a Point at all. It would just return the answer, or call with a routine. Pointlist wouldn't even be needed to add right away. We could just have parallel code for numpy under points() rather than point.
I'm not sure if I should allow this code to settle out, or accept it as is, and just fix the bits that I view as rougher edges, before updating the svgelements
on pypi.
The points
data member of Polygon
and Polyline
name-clashes with Shape.points()
. One or the other has to be renamed. What would you suggest?
In the last couple of commits I did the following:
.points()
with soft dependancy on Numpy for all PathSegments and for ShapePoint
_point_numpy
to points
in Shape.point
, but I fail to comprehend what was its purpose -- to be reviewed carefullyAlso, as noted above, Shape.points()
is hidden by Polygon/Polyline points
data member.
Attribute definitions:
Name Value Initial value Animatable points (none) yes The points that make up the polygon. All coordinate values are in the user coordinate system.
Yeah, points is going to interfere. Lemme check the spec, or around there and see if there's a name that'll work. There's a couple functions that are almost similar SVGPathElement.getPointAtLength()
and getPathSegAtLength()
with .pathLength
that would be more in line with the spec found elsewhere. But, that's too far from the beaten path.
Matplotlib has a function:
interpolated(self, steps)[source] Returns a new path resampled to length N x steps. Does not currently handle interpolating curves.
But not really doing the same thing.
I think the best name would be npoint()
it conveys the sort of you want N points along this path, understanding. And includes np in the function name. npoints() would also seem logical but all the references there are tend to imply that function is very often used to mean len(polygon.points)
for number points, like count them. I'm thinking the best name since points()
is off the table is npoint()
.
Yeah, my worst criticism is basically old docstrings. That's some damn fine code. I'll finish up a bit more testing and release the 1.3.0 version shortly. I think the utility outweighs the slight dishomogeneity of returning a list of points if numpy is or isn't installed. In theory if you didn't have numpy installed and you got that list of points and called some Point functions. This code would work fine on your machine but would crash on somebody else's machine because they had numpy installed. The alternative is to check the import and tell them to get bent without numpy. But so long as they access the stuff like [[],[]]
it will all be consistent, and I'll just note it in the documentation. Worst case I'd remove the preamble include, and just add it to the npoint()
code directly, it wouldn't change any of the API stuff so it's not much of an issue.
As discussed in #52, here is a first draft of the parallelisation of
point()
for Line, Close, Arc, QuadraticBezier and CubicBezier. The parallel version is currently named_point_numpy()
and not automatically called bypoint()
.This draft PR also cleans up code duplication between Line and Close, through a new common super-class
LinearPathSegment
.Fixes #52
Discussion on several points follows.
Path
This PR draft does not address
Path.point()
. In my project, I iterate over segments in order to avoid unnecessarily splitting straight elements (Line, Close). A properly linearised version ofPath.point()
would be somewhat complicated and would have a limited usefulness (because of the possible presence of linear segments).A more useful addition could be a
Path.linearize(max_len)
wheremax_len
is the maximum length of line sub-segments used to approximate curved elements. That's basically what I am looking for and I essentially have this function in my project.Shapes
Some shape do have a
point()
member, although nothing about it has currently been done in this PR. In particular,_RoundShape
has a lot of code duplication withArc
that would likely benefit from deduplication.Would an approch such as the following be feasible?
The parallelisation of this does bring back the
Path.point()
issue, although in this particular case paths would in principle contain a single segment (Arc
, for_RoundShape
).Tests
I'm not sure what the previous situation was, but with this PR (and #58), tests now require scipy and numpy to be installed in order to pass. This is hardly an issue for me, but it's probably worth noting.
Final API
Pending feedback and required iteration on this PR, one way of having the parallel version of
point()
in the API could be the following (Arc.point()
as example):_has_numpy
would be globally declared if global NumPy's import succeeded.