meerk40t / svgelements

SVG Parsing for Elements, Paths, and other SVG Objects.
MIT License
132 stars 29 forks source link

Possible incorrect interpretation of Path close spec #137

Closed Sophist-UK closed 1 year ago

Sophist-UK commented 2 years ago

Path::z-point determines which point the path close z/Z uses.

If I have read the code correctly, the current implementation looks for the previous Move and uses that point as the close point, however this appears to be the opposite of the specification referred to.

If I have interpreted the spec correctly, it seems to me that we should be searching for the previous Move or Close and using the end point of that object as the z-point.

Sophist-UK commented 2 years ago

P.S. Use of the for ... else construct could simplify this code slightly.

Sophist-UK commented 2 years ago

I decided to fix this as part of #138 since it was a similar issue to other two.

tatarize commented 2 years ago
If a "closepath" is followed immediately by a "moveto", then the "moveto" identifies the start point of the next subpath. If a "closepath" is followed immediately by any other command, then the next subpath starts at the same initial point as the current subpath.

https://www.w3.org/TR/SVG/paths.html#PathDataClosePathCommand

If the closepath is followed by a moveto then the moveto is the start of a new subpath. If the closepath is followed by anything else then the next subpath starts at the same point as the initial subpath.

M0,0L1,1ZL2,2Z is one path, The Z in both cases returns to 0,0 based on the location of the move.

Subsequent "moveto" commands (i.e., when the "moveto" is not the first command) represent the start of a new subpath:

The Z command goes to the initial position of the last subpath which is always defined by the M command and not by the Z command.

tatarize commented 2 years ago

Actually rereading the spec I am not sure if that is 1 or 2 subpaths. But, in both cases the Z returns to 0,0 based on the initial move location. The second Z is not based on the position of the previous Z. Do note, however, there is no functional difference here since the first Z is by definition based on the location of the M so the second Z being based on the first Z or original M can have no functional difference. However, I'm rather certain that the it is entirely based on the M.

I'll check implementations of markers at subpath ends locations in some of the robust implementations and run it by the SVG Working group if there's no agreement there.

tatarize commented 2 years ago

No software put marker-start or marker-end at subpaths of markers.

<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xml-events" width="310.000000mm" height="210.000000mm" viewBox="0 0 1172 794">
  <defs>
    <marker id="triangler" viewBox="0 0 10 10"
          refX="1" refY="5"
          markerUnits="strokeWidth"
          markerWidth="10" markerHeight="10"
          orient="auto">
      <path d="M 0 0 L 10 5 L 0 10 z" fill="#f00"/>
    </marker>
    <marker id="triangleg" viewBox="0 0 10 10"
          refX="1" refY="5"
          markerUnits="strokeWidth"
          markerWidth="10" markerHeight="10"
          orient="auto">
      <path d="M 0 0 L 10 5 L 0 10 z" fill="#0f0"/>
    </marker>
    <marker id="triangleb" viewBox="0 0 10 10"
          refX="1" refY="5"
          markerUnits="strokeWidth"
          markerWidth="5" markerHeight="5"
          orient="auto">
      <path d="M 0 0 L 10 5 L 0 10 z" fill="#00f"/>
    </marker>
  </defs>
    <path d="M 0,0 L 100,0 Z L 100,50 L 200,200 Z L 0 200" stroke="#000000" transform="translate(10,10)" stroke-width="1.0" marker-start="url(#triangleg)" marker-mid="url(#triangleb)" marker-end="url(#triangler)" fill="none" />
</svg>
tatarize commented 2 years ago

It's clear from some procedures that Inkscape decides that the existence of a Z means there are more than one subpath there.

I think it's somewhat ambiguous but I also can't think of any way it could make a difference. The additional subpath can because it multiple things break subpaths and if Z marks a subpath end and implies a move for the next subpath it's an interesting value bit. I might have to ask W3C on that, but I am not sure of a place where the distinction should actively matter.

Sophist-UK commented 2 years ago

Hmmm ... I read the spec again, and whilst it is potentially open to interpretation, on balance I think you are right.

Sophist-UK commented 2 years ago

Hmmm ... I thought about this again whilst I was having a swim in the pool - and decided that there is no real difference between your interpretation (new subpath starts after a Close even if no Move - using the previous Move end point as the initial point of this new subpath) and mine (new subpath starts after a Close even if no Move - using the previous Close end point subpath) because they are actually the same point.

However, neither as_subpaths nor subpath(index) currently use Close without a Move as a subpath end when they should.

tatarize commented 2 years ago

Yeah, that seems to be the distinction I see there. I checked the various implementations and Inkscape agrees with that. Since breaking the path breaks at the Z values. Checking in with the SVG Working group. Because that does seem to change things for the implementation there. Inkscape divides the paths into 3 rather than 1, treating close paths and forced path ends with implied start at the previous start.

w3c/svgwg#866

I think the as_subpath() code is in error. It should divide the subpath at a M or at a Z. And generally violate the typical wysiwug path segments and insert the moveto. Such that M0,0ZL500,500 is two subpaths M0,0Z and L500,500 with the L having the known start location of 0,0. which may actually relate to #130 and the concept there of an implied moveto. My explanation and understanding there was that Line(start=(0,0), end=(500,500)) was a path fragment and couldn't actually exist, but this seems to be the explicit second subpath of M0,0,ZL500,500.

Now Subpath() are not Path() and could potentially point to path fragments, but it could be that path-fragments are just weird but somewhat valid paths. "A path data segment (if there is one) must begin with a "moveto" command." seems to suggest that fragments aren't expected. And that you can have implicit and explicit path segments.

If we can very much have L500,500 exist as a subpath we should be able to have it exist as a path, but it quite explicitly implies M0,0 And the L command knows it has a start position.

It would be a breaking change though not a serious one to retcon this choice, and display implied initial moves like real moves as was requested in #130, especially here where you really could end up with a subpath of L500,500 as a real value. If the lineto has a starting position it would display as "M<>L<>" for the .d() value but actually consist of the path fragment. If the L does not possess a value for start it would display L<> for just the destination value.

That, or, if we have a Z followed by a non-move command we go ahead and just invent and shove the moveto in there so that we satisfy the "A path data segment (if there is one) must begin with a "moveto" command." for compound as well as simple paths. In strict subpaths this might be a problem since they don't alter the underlying structure. But, in the constructor for the Path() one could force them to be valid just as other routines force paths to be valid. This means len(Path(Line((0,0), (500,500))) == 2 since putting the Line there while fragmented into a path would require the path to be made valid which requires putting an explicit move in there. The Line((0,0), (500,500)).d() would still equal L500,500 but the Path() of that value would require validity and thus force invent the explicit move from the implicit move. This would in theory allow Subpath() to reference a window of the Path but when put into a Path() would end up reifying the implied moveto.

That, or, we don't do shit or change shit, and just leave everything the way it currently works.

tatarize commented 2 years ago

Reading between the lines the spec generally suggests:

If a "closepath" is followed by any command it begins a new subpath. If it begins with a moveto, this new subpath begins at the point specified by the moveto. If it begins with any other command, the new subpath begins at the same initial point as the previous subpath. The first subpath (if there is one) must begin with a moveto.

I think implementing this correctly requires that as_subpaths() returns closed paths at the point of closure, that these subpaths can begin with non-moveto commands. But there is, at most, 1 moveto in a subpath and it must occur as the first element and there is, at most, 1 closepath and it must occur as the last element. And if these elements are wrapped in a Path() they should be made to start with an explicit or implicit moveto.

Sophist-UK commented 2 years ago

Yes. I agree.

"If it begins with any other command" means that it is either the very first command or it follows a "Close" command. If it follows a Close then that Close will have gone back to the original Move (possibly recursively).

tatarize commented 2 years ago

zpoint() would find the correct point, it doesn't need to do anything more than a search of previous elements to find the previous moveto.

Sophist-UK commented 2 years ago

zpoint should be able to look for either the previous Move or the previous Close.

tatarize commented 2 years ago

zpoint should look back at the previous Move. That's technically the correct method. I don't see any difference if the close is defined correctly the points should be coincident. But, moveto is technically the correct thing to check.

Sophist-UK commented 2 years ago

Yes - it would depend on the Close end-point having been set correctly (which of course it should be - but the coding world is full of should-ofs).

We should probably code the default of (0,0) if the path doesn't have any Move objects.

tatarize commented 2 years ago

0,0 isn't necessarily correct, the correct answer is that the path is invalid. Now admittedly the starting with a lowercase m line makes it seem fairly origin based but it's really not, and all of that would be breaking changes. The as_subpaths() should likely get changed but that's also a breaking change and requires a major version switch but, is still clearly correct in ways the current methods aren't. I dunno about the move from 0,0 for clearly invalid paths.

Sophist-UK commented 2 years ago

I don't have time to check right now, but I think I read in the spec that if there is no initial Move, you should assume origin.

tatarize commented 2 years ago

I've read the spec a bunch, if you do an m rather than M it should be relative to 0,0 in that it treats the m as if it means the same thing as M in that context. -- I've read the spec a number of times.

Sophist-UK commented 2 years ago

Ah yes - that was it. So what happens if you don't have an initial Move then?

Sophist-UK commented 2 years ago

The spec actually says: "A path data segment (if there is one) must begin with a "moveto" command." So a path d string without an initial move is actually invalid - and elsewhere it says A) that you should only render the parts of the path that are valid up to the first invalid point and B) that you should treat a path with no valid commands as None.

Sophist-UK commented 2 years ago

So what should happen in svgelements when a path doesn't have an initial Move (or indeed any other invalid data) - should we raise an exception, simply ignore adding objects to a path which would make it invalid, add such objects but delete the invalid one and all following objects (if it is an insert)?

Whatever we do needs to be consistent. To be backwards compatible and not break existing calling code which creates broken paths, we probably need to assume that paths that do not start with a Move actually start at the origin and when looking back in the path for the previous move if we get to the beginning without finding one we need to assume the origin.

tatarize commented 2 years ago

It says you must have an initial move. There are some subpath things that matter.

Subpath is off-spec, at least to the extent that the subpaths are not actually defined within the spec explicitly but there are implicit subpaths and these clearly do start at a move and end at a close. And does so for either so M0,0ZZZZZZZZ has a bunch of subpaths and these need to count and be respected. This is a breaking change and requires a version bump but this is closer to what the spec says.

Sophist-UK commented 2 years ago

Yes - it does. So what happens in svgelements when you don't? OK - you are off-spec, so do we raise some sort of Exception or assume an origin start point?

I have no real view either way - I just believe that svgelements should be predictable.

tatarize commented 2 years ago

Well, you get a fragment. The lines might not have a start point and there's an open question as to whether we should invent move point or not. svg.path 2.0 and svgpathtools and a few other related path-like libraries didn't have a Move command. They just had the start point of the first pathelement be the initial Move. This actually works fairly well if you don't want to do full compound paths with multiple subpaths. If you want to match the spec correctly you need these and that requires making Move a real object. But, there could be something to if you start with the Line element then you actually invent the Move() based on the initial point.

This gets more weird obvious when you consider subpaths which do violate this requirement. You must have one for the original path, but these aren't needed for the subpaths. So it seems like it needs a robust and predictable solution.

I am leaning toward #130 which suggested invented it. Since fragments were expected to not occur in reality but actually do appear as a real thing within subpaths. And I'm thinking the best solution would be to make a new Move() as part of the validation. If you do M0,0ZZZZ this is 5 subpaths, if you query them they would be Move(0,0)-Close, Close, Close, Close, Close. But, in theory if you took the subpath Close and you sent it to a Path it would actually be validated on creation and made Move(0,0)-Close since we must according to the spec start our paths with a Move. So subpaths, since they are just windows into a Path, could be fragments in that rare edge case. Paths themselves could not be fragments, they would be required to obey the must in the spec and force adding the Move.

So Path(Line(start=(0,0), end=(1,1)) would give you M0,0 L 1,1 and would consist of 2 path segments. This would also require making sure some combinations didn't wrongly validate in a M0,0 in error though. So Path("M0,0L2,2") + "L3,3" must equal M0,0L2,2L3,3 and must not equal M0,0L2,2M2,2L3,3 but Path("M0,0L2,2") + Path("L3,3") very clearly could equal M0,0L2,2M2,2L3,3. Though this might not be the case since Path("L3,3") would likely equal Line(end=(3,3)) with no stated or expected original move position.

Sadly I'd likely need to think through most of these edge conditions and make sure it's consistent.

Etc. I can probably work out a few tests for these things that at are generally consistent with this stuff, but it does need to make sure all the test cases are taken care of, bumping the major version (since it's a breaking change), implementing the altered subpath() commands, and making sure we're consistent. And making sure previous code that might have worked before still works since Path() might actually introduce a Move() that wasn't there before so the equals code and add code etc all might call this code and it might after the change goes in cause a failure.

Sophist-UK commented 2 years ago

I would counsel against inserting an actual Move object as a new first object in a path that does not start with an "M". (There are multiple ways of creating the chain of PathSegments in a Path, and creating a new object not explicitly provided might create issues, and will almost certainly create confusion.)

I still believe that the choices are:

a. Assume an implicit origin start point (non-breaking); or b. Raise an exception at any point that a path does not start with a Move (possibly breaking).

If we go with a. then we also need to decide whether Path("L10,10") == Path("M0,0L10,10") (or indeed whether Path("m0,0l10,10") == Path("M0,0L10,10"))...

I'd likely need to think through most of these edge conditions and make sure it's consistent. Path(Line(start=(0,0), end(1,1))).d() == "M 0,0 L 1,1"

agree - when you convert a Path back to d you should include the implicit M.

Path(Move(-1,-1), Line(start=(0,0), end(1,1))).d() == "M -1,-1 L 1,1", with the validation process assigning Line.start to -1,-1.

agree - as previously stated in #130, IMO the start point of PathSegment objects should not exist or at least not exposed.

Path("L1,1") should equal Line(end=(1,1)) and not assume 0,0 as the first position. If we set the start of that line. We have no way to know we should revalidate things and therefore insert the Move() to the start (this is somewhat common that you could set an endpoint without revalidation).

complicated - IMO Path("L1,1") == Path(Line(end=(1,1)) == Path("M0,0L1,1") == Path("m0,0l1,1") - indeed Line(start=(0,0), end=(1,1)) == Line(end=(1,1)).

Path("M0,0") + "z" has 1 subpath and not two.

agreed

Path("M0,0z") + "z" has 2 subpaths.

agreed

(Path("M0,0z") + "z").subpath(1) is equal to "z"

agreed

Path(Path("M0,0z") + "z").subpath(1)) is equal to "M0,0z"

disagree Path(Path("M0,0z") + "z") == Path("M0,0z") + "z" == Path("M0,0zz")

Sophist-UK commented 2 years ago

We probably can't change this now, but for the future it might actually be beneficial to implement Subpaths as a subclass of slice object (because effectively that is what it is). You could then do:

subpath = path.subpath(2)
subpath_objects = path[subpath]
Sophist-UK commented 2 years ago

Also, we can probably get around the whole is vs. == as follows:

  1. PathSegment maintains a weak-reference list of path objects they are part of.
  2. When you refer to start, then if there is exactly one non-None object ref() in the weak-reference list, then we can determine the start as the end point of path[path.index(ref()) - 1]. If there are two non-None objects in the weak-reference list, then the start point is indeterminate because the PathSegment is used multiple times and the start point cannot be determined without knowing the path you are referring to.
tatarize commented 2 years ago

Path(Path("M0,0z") + "z").subpath(1)) is equal to "M0,0z" disagree Path(Path("M0,0z") + "z") == Path("M0,0z") + "z" == Path("M0,0zz")

Yes, but if we take subpath(1) and then wrap that in a Path we took "z" and made it a Path. And paths must begin with an moveto which is where the comes from since the second "z" there points to the last "M" which was 0,0.

The subpath is absolutely a slice of the path. That's how it's implemented. The subpath objects can just be accessed like path objects but they reflect the underlying path list of the original.

PathSegments can't maintain weak-references to the paths they are a part of, that's a lot of things and should maybe be a total of one thing, if they are already a member of a path, they would be better off cloning themselves. Also, pathsegments are supposed to be dumb by design, and I'm not sure garbage collection in python has weak references. And certainly can't and totally should not serve as links in multiple paths.

Sophist-UK commented 2 years ago

Hmm - I may have misunderstood the example because I mis-read the brackets - or perhaps because the brackets are actually unmatched. 😃

But I still disagree. Because I do not believe we should add a missing move, but instead believe we should assume it exists if it is missing - except when creating d when we should add it. Here is an example why (based on a user - for some reason known only to themselves - creating it backwards):

p = Path("z")
p.insert(0,Line(20,20))
p.insert(0,Move(10,10))
p.insert(0,Line(2,2))
p.insert(0,Move(1,1))

Assuming that you create a Move whenever the first PathSegment is not a Move the "d" result would be M1,1 M0,0 L2,2 M10,10 M0,0 L20,20 M0,0 z.

Another counter example (Path("L10,10") + Path("L20,20")).d() should == "M0,0 L10,10 L 20,20" and != "M0,0 L10,10 M0,0 L20,20".

So IMO in your example Path((Path("M0,0z") + "z").subpath(1)) == Path("z") = Path(Close()) != Path(Move(0,0), Close()) but Path((Path("M0,0z") + "z").subpath(1)).d() == "M0,0z".

tatarize commented 1 year ago

The subpath code was changed to agree with the spec as the w3c issue confirmed is the expectation.