w3c / svgwg

SVG Working Group specifications
Other
708 stars 133 forks source link

Relax SVG Path grammar with an implicite first MoveTo command #780

Open JeremiePat opened 4 years ago

JeremiePat commented 4 years ago

Hello SVGWG

According to the the current SVG2 Path grammar the first command in a path must be a moveTo command.

I would advocate for relaxing that constrain and consider that all path should start with an implicit M0,0 command.

That simple change would have two benefits:

  1. This would make the parsing of SVG path way easier without introducing any breaking change (except the fact that authors providing current wrong path would get a path drawn instead of no rendering).

  2. This would make some prose a lot less convoluted. For example, in the definition of the MoveTo command, it is stated that:

If a relative moveto (m) appears as the first element of the path, then it is treated as a pair of absolute coordinates. In this case, subsequent pairs of coordinates are treated as relative even though the initial moveto is interpreted as an absolute moveto.

Which means that a path like m1,1 2,2 must be interpreted as M1,1 l2,2. With an implicit M0,0 command at the beginning, that whole text can be removed and the management of path data will be a lot easier for implementer.

dalboris commented 4 years ago

I think I would disagree with this one, as it introduces other ambiguities, for example regarding marker placements. If m1,1 2,2 implicitly becomes M0,0 m1,1 l2,2, then this would mean 3 markers instead of two.

fsoder commented 4 years ago

I'd tend to agree with @dalboris here - to achieve this you'd likely end up trading one special-case for another. Assume for example, that to avoid the situation described above, you say to only emit the implicit moveto command when the first command isn't m or M, then you'd still need to have a special-case for the first command. It may well be that having an implicit moveto happens to line up well with some other API that you intend on feeding the commands to and thus it appears to be "really simple", but then something else is (implicitly) handling that special-case (as an example the <canvas> API does this).

tatarize commented 4 years ago

Explicit is better than implicit. Also, this would require my paths to start with M0,0. I do not want that. If I then do a M20,20 because my start position is 20,20 for my shape, I've actually created two shapes. One that consists of an unrendered move point and another one that I wanted. Any proper parsing of my path would say M0,0M20,20z is 2 different shapes.

JeremiePat commented 4 years ago

Thanks for the feed back. I actually didn't consider markers that much so thanks for the head up on this point.

That said, it appears that I haven't be clear on my point here. The idea is to relax the syntax, not to change any behavior beyond that. The main point here is to provide a standard way to resolve the first command even if it isn't a moveto command.

Resolving first segment, take 1:

So for example, a path like l1,1 could be "resolved" (and drawn) as if it where M0,0 l1,1 but still count for only one segment. same for a relative moveto command: given m1,1 2,2, "resolved" as M0,0 m1,1 l2,2 and still count as 2 segments. Note that this last example is what is implied by the current spec prose and I only suggest to extend it to the whole range of path command.

Of course if someone want to get l1,1 to be counted as 2 segments, it is still possible for an author to add an explicit moveto command to their path definition: given M0,0 l1,1 it is resolved as M0,0 M0,0 l1,1 and count as 2 segments.

As an extra bonus point for markers, for a fully drown one segment path such as l1,1 I would not be choked to have start marker, end marker and mid marker all positioned at the end point of the segment (this actually open opportunities for creative use of marker edge cases)

Interestingly enough that also open opportunities regarding the definition of some shapes (you know, the square that is made of 5 segments instead of 4)

Resolving first segment, take 2:

However, all things considered, a way to ease everything is to consider two cases:

  1. A path start with a moveto command (absolute or not): Keep the existing behaviors exactly as they are
  2. A path start with another command type: Consider an implicit moveto segment of value M0,0 and add it to the total count of segment in the path.

With that approach, the only change we introduce is to actually draw a path (with a known behavior regarding segment count) where we wasn't previously, everything else stay unchanged and we end up with an easier path syntax.

Does that make more sens ?

tatarize commented 4 years ago

I don't see any real advantages to this. It seems like it would be better to not resolve that stuff. Explicit commands are better than implicit ones. I already have some code in a python module that has an implication for incomplete paths like that. For my purposes the paths mean line from None to EndPoint. Which isn't something that gets drawn but it is something you can add to another shape or a position as such. So M0,0 + L1,1 means append Move(0,0) to Line(1,1) and without a starting position the Lx,y command means line from some position to the explicit position given by the command. L1,1 L2,2, L3,3 can work as path fragments independently already they mean None->(1,1) -> (2,2) -> (3,3) and can be appended to the end of a curve or shape or arc just as easily as anything else.

Your suggestion here is that a path fragment like L1,1 should have an implied pre-pended M0,0 but consider the alternative of having half a dozen path fragments and just putting them together with having a null-start position implied. You could see how the L1,1 fragment is easily represented by that and that you could put that fragment as part of a string of path fragments and it would properly have a 1:1 implied representation between path_d fragment and path fragment.

The proper implied starting position should be null or none. This allows a clear functional distinction between the meaning of M0,0L1,1 and L1,1 and all the little fragment pieces fit together better. My code in svgelements allows you to do things like Path('M0,0L20,20, 0,20') + 'L30,30' without any issue. But, if L30,30 implied M0,0L30,30 that would result in adding a L20,20 to a path to making a new path, rather than appending to the last point, whatever that would be, a line to the given destination.

JeremiePat commented 4 years ago

Explicit commands are better than implicit ones.

I slightly disagree on this. Implicit can be helpful for both authors and implementers if the behavior is well defined. Defining such behavior is the whole point of that conversation 😉

Your suggestion here is that a path fragment like L1,1 should have an implied pre-pended M0,0 but consider the alternative of having half a dozen path fragments and just putting them together with having a null-start position implied. You could see how the L1,1 fragment is easily represented by that and that you could put that fragment as part of a string of path fragments and it would properly have a 1:1 implied representation between path_d fragment and path fragment.

That's a very good point. AFAIK, SVG does not define sub-paths or path fragments formally. Such a path fragment could be any list of valid path commands.

The current path grammar only define a full valid path that will be render-able. (hence, it must start with an M|m command). In essence, it formally prevents the ability to parse something like a standalone l1,1 m2,2 l3,3 path fragments.

So maybe, rather than just relaxing the current path grammar, we need to formally define what is path fragment and how to use it.

I would tend to ask for the following:

  1. Relax the path grammar by removing that mandatory first M command and state that this is the definition of a valid path fragment
  2. State that a complete render-able path is a path fragment starting with an M command follow by 0 to N valid path fragments.

Some may argue that this is exactly what the current spec allows. IMO this is different in the sens that it move the error management regarding render-able path out of the parser. Checking if the path start with an M command shouldn't be done at the parser level if we want to allow path fragment to formally exist.

At the second we have formally defined render-able path versus path fragments, it opens the gate to reusable/shareable path fragments in SVG which would be an awesome features (especially for cartography). I would love to be able to do something like this:

<svg viewBox="0,0,100,100>
  <path id="myFragment" d="C100,100 0,0 0,50" /><!-- Not rendered -->

  <!-- Two path render with the path fragment defined earlier -->
  <path d="M0,50 V0 H100 V50 #myFragment z" />
  <path d="M0,50 V100 H100 V50 #myFragment z" />
</svg>

The condition for this is to make sure that a path fragment isn't an error at the parsing level.

So to conclude: The fact that a path needs to start with an M command to be render should not trigger a parsing error.

tatarize commented 4 years ago

I actually do this in python svgelements module. Which follows a fairly reasonable implementation. Each command is given an object associated with it. Generally Move, Line, CubicBezier, QuadBezier, Arc, Close are the needed elements to properly implement Path objects which are collections of such objects. It's entirely reasonable to have a QuadBezier as a object with or without an initial move. And the logic is quite consistent. Every object has a starting position and every object has an ending position. All objects have an implied starting position including Move.

Move itself can also be moved from any position to another position. And when specifying a fragment they will often have an implicit starting position. An object defining a line from 7,7 to 10,0 would actually print as M7,7L10,0 even though it's actually just a single path fragment.

Consequently I can pretty easily do that combination stuff: Path('"M0,50 V0 H100 V50') + "C100,100 0,0 0,50" + "z" is perfectly logical and parses correctly.

But, I would strongly object to shadow dom like implied use objects like that. First, it'll wreck havoc on parsers. I mean your tag there has at least 3 commands in it (and strictly speaking everything is supposed to stop trying once it sees that # there which is invalid, and the z would never be seen so this is actually a breaking change.

Part of the charm of the path stuff within SVG is that it's generally highly useful on its own. Even without all the svg stuff around it, plenty of things just parse paths and save them out. Messing with that seems like a pain. Do I really need to implement a whole shadow dom path fragment thing if I'm only parsing things to find objects and then extract their strings which a lot of naive parsers will totally do?

And I can't really conclude there's going to be some path segments that are wildly reusable at the the subpath level. If you have a subpath, you're better off just having that subpath element stand alone and reusing that.

verdy-p commented 4 years ago

May be we can say that a "moveto-less" drawing command would be interpreted by dropping the drawing fo the first fragment. But there's a good reson for saing that allcommands have an implicit origin: see relative commands that are normally based on a relative origin at 0,0: this path can still be positioned correctly outside the path itself with x=" and y=" values of objects: these paths are relative to the position of the containing element. So the same path for a ring for example could be designed within any explicit origin but just an implicit origin: an initial relative moveto could still be applied and become valid (relative to the implicit relative origin at 0,0, the whole path would be relative; it would be drawable at any other position by <use x="*" y="*" xlink:href="*"/> referencing the relative path. In fact relative or absolute "moveto" do not make any difference at start of the path: they would just be relative to the implicit origin (0,0); they are only differentiated at further positions in the path where relative "moveto" are relative to the last current point of the previous fragment, while an absolute "moveto" are relative to the implicit origin at (0,0). The same applies for other movements with "pen up" like "advance horizontally" (which is not only relative to the previous fragment or implicit origin, but also relative to the previous bearing which oriented the direction), or the "short-hand" arcs that also take an implicit position for the last control point for applying symetries to position the first control point of the next arc) Note that markers still cause a difficulty: there's no way in existing path to specify a marker-less moveto, it would require a "newpath" command, relative (for "n", relative to the previous fragment or to the implicit relative origin at 0,0) ot absolute (for "N", relative to the implicit relative origin at 0,0)

dalboris commented 4 years ago

After re-reading this discussion, I actually changed my mind and like many of the ideas proposed here (just not worded like in the original post).

@JeremiePat makes really good points about formally defining the concept of path fragments, and in fact @tatarize also gives examples of the usefulness of such path fragments. Once we have such concept, we can indeed use it to concatecate/reuse path fragments, either programatically or as allowed path data syntax, which is powerful. There is in fact a precedent: the SuperPath proposal, where Jean-Claude Moissinac proposes the concept of a "chunk", which is exactly what we're naming "path fragment" here. See:

definition of a chunk:

in a data-sp-d attribute of a path, where you want in the path (after the begining m or M) starts with ( and ends with ); following the starting ( is an id for the chunk followed by | and then followed by some commands which defines the chunk, then the terminating ); a chunk definition must always be followed by an explicit command; the sequence of command of the chunk is always translated in relative mode

<path d="" data-sp-d="M425,25L225,25 225,225 425,225(p1|L425,225L425,175Q525,125 425,75L425,25)" style="fill:#0000FF" />

direct usage of a chunk:

In data-sp-d, a reference to a chunk starts with "#" and ends with "|". Following the starting "#" is the id of the referenced chunk terminated by "|". A chunk reference is replaced by the sequence of commands associated with the id.

<path d="" data-sp-d="M425,225#p1|" style="stroke:#FF0000;fill:none;stroke-width:3" />

reverse usage of a chunk:

In a path, a reference to a chunk used in reverse order (from the end to the beginning) starts with "!" and ends with "|". Following the starting "!" is the id of the referenced chunk terminated by "|". The chunk reference is replaced by a sequence of commands that produce the same geometry as the sequence of commands associated with the id, but drawn from the end to the beginning.

<path d="" data-sp-d="M425,25!p1|L425,225 625,225 625,25 425,25" style="fill:#00FFFF" />

This whole SuperPath proposal is obviously a big change. But as a first step, I agree with @JeremiePat that we could already define the concept of valid path fragment (= same as the current valid path but without the required initial moveto), and then either:

  1. Define a valid path as a path fragment starting with a MoveTo, or
  2. Define a valid path as any path fragment. If the path has at least one command and the first command isn't a MoveTo, then an implicit M 0,0 is added.

Note that the implicit initial MoveTo would only apply when rendering full paths or their markers. It does not apply to individual path fragments. This change is backward compatible, since it only affects paths which were invalid before the change.

Note the importance of the wording "if the path has at least one command and the first command isn't a MoveTo". Indeed, we don't want to implicitly convert d="" to d="M 0,0" (an empty path should stay empty, otherwise a marker would appear), and we don't want implicitly convert d="M 1,1 L 2,2" to d="M 0,0 M 1,1 L 2,2" (again, this would add a marker). This wording is what ensures that only paths which were invalid syntax before are now rendered with well-defined behavior, and therefore ensures backward compatibility.

tatarize commented 4 years ago

Yeah, I'm kind of leaning towards not dismissing it out of hand.

My primary objection was the idea of implicit moves. Since all my objects can well lack a move and have completely different meaning. If I have a Line object between (Null->1,1). I can affix this to the end of any path and it makes perfectly logical sense. After you do this shape, you line to 1,1. But, if my Null there is required to be 0,0. Then if I affixed that to the end of an object, it also makes perfectly logical sense, but different sense. It means I appended a subpath to this path that goes from 0,0 to 1,1. And, the original meaning is not possible to express. My line path fragments can have a start point or lack a start point. If they have one they are a drawable line, with an implicit move to the start point. Though concatenating them tends to require validation. Since the end point of the current path is not necessarily the start point of the appending fragment. I side with the original endpoint rather than fragment start point, unless the original endpoint is Null.

Moissinac's (@moissinac) proposal provides a few concrete examples of the utility of this that were not previous offered or discussed. Namely that all holes in objects will always duplicate the path data. And if the programs knew these path-fragments were the same thing in both, you could edit one, while also editing the other.

Now, clearly there are issues with the proposal, which likely they figured out and hence the new implementation they offer up. The textual code will often not include the start position. L1,1 is alone ambiguous. This is why I print that as M0,0L1,1 if it had a start position known, or L1,1 if it does not (this too is kinda ambiguous since M0,0L1,1 could mean Move(0,0) Line(0,0->1,1) with an explicit rather than implicit move). You are simply given the line-to and the destination. Where you came from is not given.

In the analysis they correctly gave the graph explanations that you could absolutely take sections of graph, give them a name and play them back in either forwards or backwards order. Since this section would be named and we'd know it's the same section, we could absolutely be able to edit it as a single unit. The problem here is that reversing a fragment is not always a stroll in the park. Take the fragment L1,1 what is that reversed? Well, it's actually M1,1L<Null> sort of. If we only know where that line is going to and the operation, and we reverse it, we now only know where it came from. We do not know where it's going.

Kind of strange things also happen when you reverse a path segment that is closed with a non-zero path close. You sort of have to draw that implied line segment, and float the Z back to the end in the reversed, or you get a weird shape which is cycled wrong (the line part of the Z goes at the front, but the actual Z command with no length remains at the end).

superpath

You'll notice in the discussion there that isn't really an svg path or really a set of them. It's actually a sort of graph in graph theory, of three loops. This would be really kind of useful, but kinda of impossible to represent. Since you're actually taking the destination of the previous segment.

If we have the path, M0,0 10,10 20,0 z. If your shape was reusing the segment from z to M, you are clearly shooting yourself in the foot. But, also, if your path is reusing the segment from 20,0 to z, you're again shooting yourself in the foot. In the simple triangle, you can basically only use maybe 0,0 to 10,10. However in that case the command there is an M with the 0,0. So you'd either be forcing the new subpath, by demanding the M there, or interpreting it as the method of connection between the two points. And if you're copying the 10,10 to 20,0, it's not specified but it's an L command there. But, you can only kinda do this if the operation is between the two different points.

The second implementation of superpath has a nicer and easier syntax. Basically you id a bit of path, and you can reuse it. It's basically just string based compression thing, except that you can modify the object and it would modify both copies. Though there's likely some ambiguity there too.


While I absolutely think discussions of path fragments is worthwhile, and they are nice enough things at the low programming level, but some of this stuff seems like it's kinda trying to build graph theory into SVG. I'd love a good SVG-esque way to represent graphs but the need for some path fragments to go backwards and the oddities that come with reversing that. I'm not sure there'd be any way to easily fragmentize a path on the fly. Since the thing we really want is starting and ending with nodes (with segments having the implied operations). SVG offers us . And it's a little hard to cut and paste that into being a proper graph. It takes a lot of work. And you'd kinda have to modify points into nodes, since you can kinda reuse them. And it seems to spiral into something useful but ultimately not SVG.

tatarize commented 4 years ago

Ultimately the problem remains that the to clip a fragment you still need a point that occurs before it. But you need that point to not have any implied meaning. The meaning of Move is well established. It means you do not draw that length and you go to that point without a drawn operation. If I stick a move together with other path objects or fragments I get subpaths. If I completely omit it, then I do not know where I came from and if I am drawing a line from a point say (0,0) to a point (1,1) where I started is absolutely essential to knowing what I mean by that fragment.

Lemme offer up a little bit of potential syntax. Let's say we add a command F to SVG which means Fragment, Floating, From, First. That expresses the starting point for the following operations, but implies absolutely nothing about the operation itself. It functions like a move, since an operation like line-to following it would be a complete fragment of just a line operation that isn't moved to, and thus kinda floating.

Now if we revisit the initial segment problem:

If we reverse these fragments:

We would then simplify F1,1L0,0M0,0 to basically M1,1L0,0

But typically if I concatenate something with an M to my path, that's a subpath. But, if I concatenated something with an F to my path, that would mean that's the explicit start point to that fragment. If it does not agree with the path I'm merging with, it should be omitted (and use the current endpoint). But, this does not have a segment-meaning and does not imply that I arrived there without drawing any path segment. Just that the next command in the fragment started from that location.