xsl:variable/@as - simplifying the language - attempt 2

MarkNicholls commented 7 months ago

I've thought about it.

The key issue I had which genuinely caused me years of confusion (I didnt understand it so I ignored it, and dealt with it by typeing random xslt code)....this....

            <xsl:variable name="presentationMediaElement" as="element(urn:presentationMedia)">
                <presentationMedia/>
            </xsl:variable>

if I don't declare the "as" then it does something different and confusing (it assumes its a document element I think, though I NEVER want it to do this).

so for stylesheets declared as version "4.0"+, can we make the default interpretation of that its an element?

Does this breaks backwards compatability with v1? tbh, the code is already incompatible because the equivalent 1.0 code requires node-set, its already broken, so I suggest making the fix simple to understand.

why is this so irksome to me? because for me its incredibly confusing

its confusing because (and i didnt express this well the last time), it makes a type declaration have inconsistent behaviours.

In languages with OO (is it reynolds?) type systems this also happens BUT in an OO type system an expression has a type than can be cast to a subtype and a subtype is very special because everything that is true of the supertype (in the constained type logic) is true of the subtype (you can express this in terms of set/class membership in a universe if thats how you think about these things).

but in this case, this isnt the case....the two interpretations are disjoint, this isnt a cast.

So the concrete proposal is uniquely define the semantics of.

             <xsl:variable name="presentationMediaElement">
                <presentationMedia/>
            </xsl:variable>

to be

            <xsl:variable name="presentationMediaElement" as="element(urn:presentationMedia)">
                <presentationMedia/>
            </xsl:variable>

from 4.0 onwards.

(ironically, personally i will probably still put the "as" clause in, but if i were trying to learn the language today I'd understand this on day 1, not day 1000).

P.S.

I have a suspicion I still dont fully understand it, but i'm sure someone will point that out in due course.

michaelhkay commented 7 months ago

Yes, it's a very confusing design, but I believe that changing it will cause even more confusion.

Firstly, having a rule that says "you get an implicit document node if there is no 'as' attribute" is simpler than having a rule that says "you get an implicit document node if there is no 'as' attribute and the effective version is less than 4.0".

Secondly, "mode bits" (behavoiur dependent on version attribute) are a last resort when it comes to solving compatibility problems. For users who aren't avid readers of the spec (like you, it seems), they are just another elephant trap to fall into. For users who know and understand the spec intimately, they are left with the problem of deciding whether to set the version attribute one way or the other, when they probably have a code base that is far too large to examine in detail.

gimsieke commented 7 months ago

Firstly, having a rule that says "you get an implicit document node if there is no 'as' attribute" is simpler than having a rule that says "you get an implicit document node if there is no 'as' attribute and the effective version is less than 4.0".

A wonderful thing about XSLT is that you can just increase the version number in your stylesheet and it will still be executed in the same way. This has made upgrades from 1.0 to 2.0 and from 2.0 to 3.0 the easiest thing. I’d like to keep it like that, in particular for implicit document node generation in the absence of an @as attribute.

Another aspect is that a 2.0 or 3.0 processor that doesn’t know about such a new 4.0 rule will still attempt to execute the 4.0 stylesheet, but it will execute it as if it were written in the most recent XSLT version that it natively understands [1,2]. This may likely lead to different outputs than the 4.0 processor produces even if no other newly introduced 4.0 constructs are used.

[1] https://www.w3.org/TR/xslt20/#forwards [2] https://www.w3.org/TR/xslt-30/#forwards

MarkNicholls commented 7 months ago

I never read specs....well almost never.

ok...so what about a directive? like

<xsl:default-literal type="element"/> what I'm trying to do is remove the wrinkle without chaos ensuing.

MarkNicholls commented 7 months ago

Firstly, having a rule that says "you get an implicit document node if there is no 'as' attribute" is simpler than having a rule that says "you get an implicit document node if there is no 'as' attribute and the effective version is less than 4.0".

A wonderful thing about XSLT is that you can just increase the version number in your stylesheet and it will still be executed in the same way. This has made upgrades from 1.0 to 2.0 and from 3.0 the easiest thing. I’d like to keep it like that, in particular for implicit document node generation in the absence of an @as attribute.

Another aspect is that a 2.0 or 3.0 processor that doesn’t know about such a new 4.0 rule will still attempt to execute the 4.0 stylesheet, but it will execute it as if it were written in the most recent XSLT version that it natively understands [1,2]. This may likely lead to different outputs than the 4.0 processor produces even if no other newly introduced 4.0 constructs are used.

[1] https://www.w3.org/TR/xslt20/#forwards [2] https://www.w3.org/TR/xslt-30/#forwards

I noticed this by accident and got VERY confused....maybe a stylesheet directive is a better option?

MarkNicholls commented 7 months ago

Firstly, having a rule that says "you get an implicit document node if there is no 'as' attribute" is simpler than having a rule that says "you get an implicit document node if there is no 'as' attribute and the effective version is less than 4.0".

A wonderful thing about XSLT is that you can just increase the version number in your stylesheet and it will still be executed in the same way. This has made upgrades from 1.0 to 2.0 and from 2.0 to 3.0 the easiest thing. I’d like to keep it like that, in particular for implicit document node generation in the absence of an @as attribute.

Another aspect is that a 2.0 or 3.0 processor that doesn’t know about such a new 4.0 rule will still attempt to execute the 4.0 stylesheet, but it will execute it as if it were written in the most recent XSLT version that it natively understands [1,2]. This may likely lead to different outputs than the 4.0 processor produces even if no other newly introduced 4.0 constructs are used.

[1] https://www.w3.org/TR/xslt20/#forwards [2] https://www.w3.org/TR/xslt-30/#forwards

I'm not sure i understand

so you want all subsequent versions of XSLT to preserve the exact specified semantics of all previous versions?

I find this a bit strange, yes it makes upgrades easy, but it dooms the language to preserve all past mistakes.

OK, some are easy the odd function here or there that gets replaced, but core issues are then preserved forever.

Wouldnt it make more sense that the processor obeys the stylesheet version? else whats the point of having a version number at all?

michaelhkay commented 7 months ago

Absolutely. I remember being taught that by David Wheeler 50 years ago as an undergraduate: backwards compatibility means deliberately repeating other people's mistakes.

Basically, if you break compatibility with a new version of a language, you can find yourself dead in the water. Particularly with a language that's 25 years old, where people have a vast amount of legacy code that was written by people who have long since departed, for which there are probably no adequate tests. People won't upgrade unless it's seamless.

So indeed, what's the point of having a stylesheet version? Not all that much, actually. It is used to switch on 1.0 compatibility mode, but 1.0 to 2.0 is a significant disruption, and that's one of the reasons that so many people are still stuck on 1.0. The version number didn't really solve the problem.

MarkNicholls commented 7 months ago

@michaelhkay

i find this an odd state of affairs.

People won't upgrade unless it's seamless

If I "upgrade" the xslt engine and it obeys the version number in the stylesheet nothing is broken? I need change no code as long as the upgrade obeys the version number (i.e. the same way as library components works in JVM/dotnet/python/C/C++ etc etc).

What you seem to be saying is that people want all the old code to work and still use new features seamlessly?

But wouldnt you say that this dooms the language? it can never evolve to fix problems, it can only get fatter, it can never get leaner.

(or am i misreading you? or misrepresenting it?)

P.S. our reticence to move from 1.0 to 3.0 wasnt because of the cost of upgrading legacy, if the engine obeys the version number, we mostly didnt migrate that code, the cost to us is the extra skills required, XSLT 2+ is much less forgiving, and the wrinkles I am suggesting should be removed are a significant part of that cost.

P.P.S.

the version number for us DOES work, it means we can run old v1.0 XSLT stylesheets in the same environment as new ones.

davidcarlisle commented 7 months ago

@MarkNicholls It is not at all clear what you are suggesting the default as should be, it obviously can't be element() presumably as="node()*" ? that would be a big change and mostly not for the best. Parentless elements are rather weird and making that the default would add more confusion than "simplification" I suspect.

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:variable name="foo">
  <first/>
  <second/>
 </xsl:variable>

 <xsl:template match="first[following-sibling::second]">here</xsl:template>

 <xsl:template name="main">
  <xsl:apply-templates select="$foo"/>
 </xsl:template>

</xsl:stylesheet>

produces here in xslt 2 or 3 (or the ill fated 1.1) you are suggesting making it act like

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:variable name="foo" as="node()*">
  <first/>
  <second/>
 </xsl:variable>

 <xsl:template match="first[following-sibling::second]">here</xsl:template>

 <xsl:template name="main">
  <xsl:apply-templates select="$foo"/>
 </xsl:template>

</xsl:stylesheet>

which produces an empty document as output as <first/> and <second/> are no longer siblings. Having the variable contain a sequence of elements which are not siblings is consistent and usable if you know that's what you have, but seems better to explicitly ask for that. The existing default of defaulting a / parent makes a lot of things work more naturally.

michaelhkay commented 7 months ago

But wouldnt you say that this dooms the language? it can never evolve to fix problems, it can only get fatter, it can never get leaner.

Indeed, that is the fate of all successful programming languages.

If you're going to break compatibility then you might as well invent a new language (Perl 6 became Raku). If you can't offer a very high level of confidence that existing programs will run unchanged, then for a great many applications the cost of transition will exceed the benefit, so people won't do it, which means you have essentially created a fork in the development of the language and in the user community.

In addition it's very likely that implementations will fork too. When you introduce subtle changes in semantics (like the changes in numeric comparisons that we're trying to introduce in 4.0), it becomes a significant challenge for implementations to support both simultaneously. If you've split the user community into those who want the old semantics and those who want the new, then you're going to end up having to serve the different user groups with different products, creating the risk that neither group will be well served.

MarkNicholls commented 7 months ago

I dont understand

If you can't offer a very high level of confidence that existing programs will run unchanged, then for a great many applications the cost of transition will exceed the benefit

This is successfully handled currently in saxon from v1.0 and 2.0?

was that a mistake? What would have been the consequences of that not happening? Would we be having this conversation about a language that was consistent with 1.0?

P.S.

Your statement is only true if the implementation ignores the version number, all the software we using is using "legacy" libraries, much of which are technically out of date, and they work successfully because the version numbers are obeyed, breaking changes happens are absorbed into the client code, and the dependencies updated , it is the inevitable cost of progress.

You can claim (and i will accept) that from the language implementor's perspective, supporting multiple versions of a language (or library) is problematic (and uneconomic), absolutely, that's your trump card, but then the issue isn't the cost to the user community, its the cost to the implementor, there is no logically inherent need for there to be any cost to the user community, unless the user community chooses there to be.

If you are saying that this is the only economically viable strategy from your perspective, then I will accept that.

MarkNicholls commented 7 months ago

@davidcarlisle

You'll have to excuse my relative ignorance, the suggestion is driven by the apparent inconsistency (the types are disjoint) and then the resulting confusion, but your example I frankly don't understand (why is this so hard?).

let me take your example and try to work out whats going on in it.

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:variable name="foo" as="node()*">
  <first/>
  <second/>
 </xsl:variable>

 <xsl:template match="first[following-sibling::second]">here</xsl:template>

 <xsl:template name="main">
  <xsl:apply-templates select="$foo"/>
 </xsl:template>

</xsl:stylesheet>
which produces an empty document as output as <first/> and <second/> are no longer siblings. Having the variable contain a sequence of elements which are not siblings is consistent and usable if you know that's what you have, but seems better to explicitly ask for that. The existing default of defaulting a / parent makes a lot of things work more naturally.

I find it very interesting that it doesn't "work", but let me have a go at understanding it.

i.e. I believe you, but clearly I don't understand this as much as I thought, and in a sense, that makes the motivation for simplification greater, not less.

I'll respond soon.

davidcarlisle commented 7 months ago

I don't understand this as much as I thought, and in a sense, that makes the motivation for simplification greater, not less.

It would be better not to describe either of the proposed changes with loaded descriptions such as "simplification". The change here is an incompatible change that may have merit but certainly isn't a simplification.

given

<xsl:variable name="foo" >
  <first/>
  <second/>
 </xsl:variable>

currently to select the <second/> element you can use select="$foo/second" You are proposing that you have to use select="$foo[self::second]" In what way is that simpler?

Even if it was simpler or better it would make changing version=3.0 to version=4.0 on a stylesheet break a large percentage of expressions accessing variables. They would give different results with no warning or error as the expressions would be valid but with different meaning. I can't see that as an option at all (not that it's my choice). I have been using xslt for a long time, since before xslt 1. Changing from version 2 to version 3 was essentially a worry free operation. A change as proposed here would mean essentially I would have to stay at version 3, there would be no cost effective way to check the tens of thousands of lines of xslt I have in use to check how they would be affected by such an incompatible change in the behaviour.

michaelhkay commented 7 months ago

I dont understand

If you can't offer a very high level of confidence that existing programs will run unchanged, then for a great many applications the cost of transition will exceed the benefit

This is successfully handled currently in saxon from v1.0 and 2.0?

As I'm sure you are aware, there are a great many users still using 1.0, and there are a great many processors that still only support 1.0. From that perspective, the introduction of XSLT 2.0 can hardly be seen as a success story. To what extent backwards compatibility played a role in this is something we can only speculate on; but whatever the reasons, leaving half the users behind is certainly something we don't want to repeat.

Your statement is only true if the implementation ignores the version number, all the software we using is using "legacy" libraries, much of which are technically out of date, and they work successfully because the version numbers are obeyed, breaking changes happens are absorbed into the client code, and the dependencies updated , it is the inevitable cost of progress.

You're looking at this with far too narrow a perspective. Look at the reasons XML 1.1 failed: it simply didn't offer enough benefit to users to make them accept the transition costs, and the fact that users weren't attracted to it meant that vendors weren't attracted to it either. As a standard, it therefore failed. The version number mechanism was technically adequate to ensure interoperability, but in terms of the economics of the ecosystem, it wasn't viable: if only 5% of the users benefit from features in the new version, the other 95% will stay on the old version unless migration is zero-cost, which it wasn't.

You can claim (and i will accept) that from the language implementor's perspective, supporting multiple versions of a language (or library) is problematic (and uneconomic), absolutely, that's your trump card, but then the issue isn't the cost to the user community, its the cost to the implementor, there is no logically inherent need for there to be any cost to the user community, unless the user community chooses there to be.

Costs to users and costs to implementors are inextricably related; if a standard is expensive to implement then there won't be many implementors offering implementations in free software.

MarkNicholls commented 7 months ago

@davidcarlisle

lets split this into 2 points

to be honest I think it is a simplification, the fact that some expressions appear to become more complex isnt surprising, in the same way as using the model { Succ, +, * } theory of arithemetic is much simpler than the our usual decimal or binary representations, it unfortunately makes expressions longer.

But the simplicity I claim is logical really, having 2 interpretations of 1 expression is not at all simple, and for me highly undesirable.

Your example was good though, i enjoyed it, but I think what it underlined to me was if I want a document I should create an expression that uniquely maps to a document, if I want a sequence I should create an expression than uniquely maps to a sequence, any subsequent expressions that are dependent on these expressions then requires us to use the appropriate syntax.

explicit, transparent, uniquely defined (which i claim is a simpler theory, that may require more explicit expressions).

I'm now of the opinion that maybe the XSLT 1.0 situation of having to using "node-set" was in a way preferable

I dont think you would write

select="$foo[self::second]"

I think you would write

select="document($foo)/second"

where document is the obvious function that takes nodes and embeds the (explicitly) in a new doc.

i.e. you take a sequence use it to explicitly create a document and then select the elements called second.

simple, explicit, transparent and uniquely defined.

the second issue is the upgrade cost, and maybe I've been worn down, and concede that a breaking change that alienates an existing cohort of developers who DO understand this, but for little immediate return is at best risky, and at worst foolish.

but I dont want to put the cart before the horse, I think you should openly discuss issues and problems, propose solutions, then discuss the costs of those solutions......but I do accept this is a problem.

MarkNicholls commented 7 months ago

@michaelhkay

I think you've worn me down here, this is your trump card, I'm not massively familiar with the story of XML 1.1, I can tell you what the barriers are to my personal reticence was to move from 1.0 to 3.0, and I can explain the difficulties I have had (and still do) and others have in understanding the language, I stand by these observations, and where I perceive the wrinkles are.

It irritates me that what I perceive as wrinkles seem to be boiled into the language, but I accept your reasoning, which renders my argument "naive", my irritation is with the reality.

It actually comes back to my suggestion of a "strict" mode, personally I'd turn this "strict" mode on, and the wrinkles would disappear, expressions that look like sequences of nodes, would be sequences of nodes and not be coerced into documents. <xsl:document> would make (for me) a welcome appearance into my code when I wanted a document explicitly generated, and the delineation of documents and sequences in my code would be explicit.

I'm genuinely tempted to write a linter to enforce these rules into my code , for me at least it would improve the clarity and quality of my code.

davidcarlisle commented 7 months ago

I think you would write

select="document($foo)/second"

where document is the obvious function that takes nodes and embeds the (explicitly) in a new doc.

eek no!, apart from the fact that document is a pre-existing function, if you had a new function doing this, it would be returning copies of the nodes in the variable when you want (almost all the time) to select the actual nodes that are in the variable. You are suggesting copying the entire node tree in the variable to re-parent it. The xslt processor may be able to optimise away some of the copying but conceptually that just seems wrong.

expressions that look like sequences of nodes, would be sequences of nodes and not be coerced into documents.

I guess that's the fundamental difference, if I look at

<xsl:variable name="foo" >
  <first/>
  <second/>
 </xsl:variable>

I'd say <first/> and <second/> look like (in fact, are) siblings in a node tree. It's OK that the as="node()*` can be used to force them to be a sequence of nodes in unrelated trees but not something that is very often useful compared to the very useful as="xs:string" to get a string rather than a document node containing a text node from

hello

MarkNicholls commented 7 months ago

eek no!

I'm only saying, you make explicit in the expressions what is actually happening, if you want a document you use <xsl:document/>, that is implicitly what is currently happening, my only problem is its invisible.

I'd say and look like (in fact, are) siblings in a node tree

how would you write an expression of 2 nodes in a sequence?

I think the issue here is you are a "native" speaker, I natively speak English, and people that dont tell me its a horribly irregular and difficult language to learn, I believe them, but to me its obvious (mostly!).

If you put as="xs:document" in explicltly, my saxon complains that thats an error, because <first> is an element.

I agree with saxon.

I can get that to go away by explicltly changing the expression and wrapping it in <xs:document> then saxon will accept it.

If you do this explicltly, saxon seems to tell you whats going on and forces you to declare the expression in the unique manner, I'm simply saying there must be some logic in there to say...."oh....this looks like a sequence of elements, but there is no 'as' clause, so we'll just secretly wrap the expression in <xs:document>" because we're going to assume thats what they really want.

If this logic is removed (made simpler), then I think the situation (in a green field utopia) is prefereble, nothing secret happens, expressions have unique interpretations and newcomers are forced to accept that documents and sequences are different.

MarkNicholls commented 7 months ago

miraculously ive read the spec

9.4 Creating Implicit Document Nodes A document node is created implicitly when evaluating an xsl:variable, xsl:param, or xsl:with-param element that has non-empty content and that has no as attribute. The value of the variable is this newly constructed document node. The content of the document node is formed from the result of evaluating the sequence constructor contained within the variable-binding element, as described in 5.7.1 Constructing Complex Content.

this is what I'm suggesting removing.

i.e. it takes the sequence-constructor (i.e. the expression) and secretly embeds in in a document....so you have an expression that is a sequence magically interpreted as a document.

it could be a setting on the stylesheet.

<xsl:implicit-document select="false"/>

the default could be true, but pedants like me (and in training scenarios) you'd set it to false.

michaelhkay commented 7 months ago

If you want to provide a switch to allow people to voluntarily disable this 1.0 behaviour, then a switch to make the "as" attribute mandatory would be better than one that changes the behaviour when the attribute is absent. You don't want people reading commonly-encountered code like:

<xsl:variable name="x">banana</xsl:variable> and having to go to the top of the stylesheet module to know what it means.

MarkNicholls commented 7 months ago

is this 1.0 behaviour? (not that it matters), in 1.0 I'd have to use node-set to construct a node-set, in some ways that seems more intuitive and closer to my personal utopia.

I can ask for strict, but it was rejected, and in sense that works for me personally but doesnt really address the underlying harm....the underlying harm I think is the mental confusion introduced by the logical inconsistency.

I actually don't know what

<xsl:variable name="x">banana</xsl:variable>

does mean, its not a construct i would use...which is scary, is it a document node with a text node in it? or is it a text node? or a xs:string? I'd have to start up oxygen and look, I'll take a look tomorrrow when I start doing some coding.

I think I've flogged this one to death now, I think you at least understand the motivation, even if you don't like the proposals.

michaelhkay commented 7 months ago

I actually don't know whatbanana</xsl:variable>means

Most users who write it don't know what it means either. And they might equally write

<xsl:variable name="x"><xsl:text>banana</xsl:text></xsl:variable>

or

<xsl:variable name="x"><xsl:value-of select="'banana'"></xsl:variable>

All of which do the same thing: they bind the variable to a document node that has a single text node child whose string value is "banana".

Users who don't read the spec often imagine that it means the same thing as

<xsl:variable name="x" select="'banana'">

and 98% of the time it behaves exactly as if that's what they had written (though in Saxon it can be three times slower). But strings are not the same thing as document nodes owning text nodes, and that eventually trips users up.

I think I understand very well where you are coming from: there are many subtleties in the spec here that typical users are not aware of, where constructs do what users expect most of the time despite the fact that they have an incorrect mental model of the semantics. These features therefore cause surprises when users do something that exposes their misunderstanding. It is almost impossible to solve these usability problems while remaining backwards compatible at the same time.

MarkNicholls commented 7 months ago

I know this is a recurring theme, but developers dont learn languages from specs, in fact some languages dont have specs!

(though actually in this case the spec was very enlightening).

Ok so in summary a) I'm not imagining this...someone accused me of making an issue like this more complex than it was...I dont think i am. b) the enthusiasm for removing this wrinkle in version 4 is low...almost 0 c) the enthusiasm for making this a directive is low (I don't massively have a problem with directives, and i do think they are the only way to resolve these sort of things given the conversation above), i reluctantly accept that default behaviour needs to be backwards compatible unless there is a strong benefit to the user community. d) the symptom would be alleviated by a "strict" mode that insisted that all vars etc were declared has been reaised before, and enthusiasm was pretty low. e) if i want to resolve this just for me...I can write a linter.

If the above is true, we just close this and get on with our lives.

does that sum it up?

michaelhkay commented 6 months ago

How about

<xsl:package declared-types="yes"> making "as" attributes mandatory on all xsl:param and xsl:function elements, and onxsl:variable and xsl:with-param elements if there is no select attribute?

I'm not sure how many users would choose to use it though.

MarkNicholls commented 6 months ago

being selfish, then yes, I'm not familiar with package, my motivation is to ensure my declarations are all declared (something i routinely do) AND in particular the element/document nuance is explicit.

So that would seem to work for me.

I share your lack of confidence in other people using it, but I would.

MarkNicholls commented 6 months ago

@michaelhkay

thinking about this, yes I personally routinely litter my code with "as"...I'm statically minded, so its sort of natural for me, but I suspect most people arent.

The key motivation here is the wrinkle around element/document, so a more amenable change is to flag the situations where element or document are valid type declarations is acceptable.....and has a much lighter footprint.

In utopia I would expect that ambiguity to be a warning, that I could configure to be an error (or better an error people could configure to be a warning).

That I think is more likely to be used (or simply a default config), I'll still go on littering code with "as", and others can omit them except for this one case.

(its similar to warnings about type coercion that lose precision, this isn't about precision, its about sanity).

ndw commented 3 months ago

The CG agreed to close this issue without further action at meeting 081

qt4cg / qtspecs

xsl:variable/@as - simplifying the language - attempt 2 #1055