qt4cg / qtspecs

QT4 specifications
https://qt4cg.org/
Other
28 stars 15 forks source link

Discussion: include/import of files. #1449

Open MarkNicholls opened 1 week ago

MarkNicholls commented 1 week ago

I don't especially do this.

a) because the environment that we run XSLT in (95% of the time) doesnt support it b) because its too rigid to use (maybe I'm doing it wrong).

Motivation:

I have 'module' List.xsl lets say that models lists, and has a function (in psuedo XPath so you can see the types) (a module contains constructors and ideally all functions related to, here, a list)

function tryHead($xs as list()) as maybe()

I have 'module' Maybe.xsl lets say that models Maybe

function toList($xs as maybe()) as list()

So Maybe.xsl needs to know about List.xsl and List.xsl needs to know about Maybe.xsl

if I use xsl:include or xsl:import (in saxon) I get:

The stylesheet module includes/imports itself directly or indirectly

(which I'm happy reflects the correct behaviour given the spec - i.e. I don't think this is a bug in the implementation)

Given that I CAN write this cycle in a single file, without restrictions on the order of the constructs (i.e. this isnt a restriction of the language itself - this isnt alway true in other languages), it seems less than ideal that I can't freely compose files in order to replicate the situation an allow decomposition into logical files.

(I expect this is a consequence of the rules around priority of templates, which I am broadly ignorant of, and to be honest, largely not directly concerned with - I don't use this to 'compose' templates, but to write 'function' libraries)

(MK has answered a question in stack overflow related to this, which solved the issue by reorganising the files, which is what I currently do, but its not ideal that I cannot decompose and test code in isolation)

Is this worth resolving?

Many years ago I was a C programmer, and I remember a similar issue that was resolved with #define, whereby header files were imported something like this (excuse my made up C syntax)

#ifndef _Maybe_
#include "Maybe.h"
#endifndef

and then Maybe.h would define _Maybe_, this would mean, if you followed the idiom, that the file was included by the preprocessor, at most once (I'm not suggesting this mechanism is directly applicable, just that this is a common issue elsewhere).

michaelhkay commented 1 week ago

Immediate thoughts:

(a) for xsl:import, you get an override of everything in the imported module, so it's hard to define semantics for recursive import because you would get an infinite number of overrides at different import precedence; there is no longer an unambiguous "highest precedence" version of any template rule.

(b) for xsl:include, I think people often overlook that you don't need to directly include everything that you reference; a single set of xsl:includes in the top level module allows everything to reference everything else. However, there are advantages in a module declaring its immediate dependencies, so I can see why people want to do it: it provides documentation, it helps syntax-directed editors, and it makes the module easier to reuse in a different stylesheet.

Potentially we could say that redundant xsl:includes are ignored, which would mean that cycles are no longer an error, and including the same module more than once no longer causes an error due to duplicate function/template names. The main difficulty in doing this is that the order of declarations is (sometimes) signficant. However, the fact that "declaration order" is defined also provides a possible solution: we could say that if a stylesheet package contains multiple xsl:include declarations pointing to the same stylesheet module, then all but the first in "declaration order" are ignored.

In extreme cases saying that duplicate xsl:includes are ignored could represent a backwards incompatibility (if the included stylesheet only contains template rules, then under the current spec the template rules from a later xsl:include take precedence). It's hard to imagine anyone is relying on that by design, but it's possible that legacy code is relying on it unintentionally. A safer but less simple rule would be to say that only cyclic xsl:includes are ignored.

MarkNicholls commented 1 week ago

I did suspect the precedences were the underlying issue, I think the problem here is (I suspect) that import/include were intended to do something stronger than simply a copy paste like include (as in the C .h file situation), i.e. I think the intention is analogous to inheriting (in the visitor pattern) or extending open match statements in other languages.

The simplest (and safest) solution is another instruction (or some option on the existing one) that says...JUST 'copy/paste' the contents of the referenced stylesheet, and only do that the first time you see it, but I accept never ending language extension has its drawbacks.

I agree (if the current constructs are adjusted) there is a slight risk of unintended regression problems.

I accept that the cyclic suggestion addresses the original specified motivation, but I think repeated references is the underlying issue, solving cyclic ones may not be sufficient to enable this style of development, in which case your SO answer may well still be the preferred solution.

I can try to construct a more general use case that isn't cyclic if that helps?

off the top of my head, I have an array module and a list module that both reference a maybe module, if I include array and list, maybe is included twice and this causes an error.

Is the old C trick directly applicable? As this isnt a compiled language the notion of preprocessor vs executable instruction isnt clear to me, but if we could #define variables and #if them, and then conditionally include files, that solves the problem, with no regression issues, I don't know enough about this though to know if this is ridiculous, but it MAY be a more generally useful construct.

Arithmeticus commented 1 week ago

I am unclear what changes to the specs are being proposed.

Include is roughly a hard copy. Import is roughly the same as include but there are provisions for overriding duplicate structures. https://www.w3.org/TR/xslt-30/#element-import

Here is a little trick I use when I know that I need to have two stylesheets refer to each other, and I am not prepared to think about a common stylesheet that includes/imports both.

In stylesheet a:

<xsl:param name="a-is-master" as="xs:boolean" static="true" select="true()"/> 
<xsl:import href="b.xsl" use-when="$a-is-master"/> 

Mutatis mutandis for b.xsl.

On occasion, structures reveal themselves and I'm able to rearrange includes/imports and take that scaffolding down. Other times it needs to be a permanent arrangement, and it works well as long as developers are aware of what's going on. So be sure to document your practice.

michaelhkay commented 1 week ago

I've drafted a PR whose effect is that duplicate xsl:include declarations within a stylesheet level (ie. at the same import precedence) are ignored, rather than causing an error. Please review it and comment; I need further thought before I'm convinced that it works.

MarkNicholls commented 1 week ago

@michaelhkay

Do you want comments here?

I've read the 3.0 spec and I don't understand the notion of "the same import precedence", import precedence seems to be by definition a total order, so they would be the same precedence unless they were effectively the same statement....this isnt something I'm familiar with.

From my perspective I'm simply doing set unions on sets of functions, but the unions can happen in any order.

e.g. this would not be uncommon.

A includes B B includes C A includes C

tbh, it would have to potentially support any set of binary includes, the dependency tree is a unconstrained graph, given what I think I understand of import precedence (which I suspect I don't), I would really want it to ignore all imports where there exists the same import at a higher (or equivalently lower) precedence, I think, for my use case you just want a single rule.

As I say, mentally I'm using the C #include idiom.

For me xsl:include and xsl:import are too powerful, I really desire something quite weak and simple.

MarkNicholls commented 1 week ago

@Arithmeticus

I don't understand your trick.

<xsl:param name="a-is-master" as="xs:boolean" static="true" select="true()"/> 
<xsl:import href="b.xsl" use-when="$a-is-master"/> 

how does this help? (it looks like the C idiom but I can't comprehend the mechanics), when would b.xsl not be imported?

Arithmeticus commented 1 week ago

Static parameters (static="true") allow you to exclude any XSLT component at run time. You can drop XSLT components by invoking use-when. Because of this combo of static parameters and use-when, if a.xsl is your primary point of entry, it will have the xsl:import statement but b.xsl will not. If b.xsl is your primary point of entry, it will have the xsl:import statement but a.xsl will not.

One caveat. If you have stylesheet c that needs both a and b it should do so by importing/including one or the other, but not both.

Below is the full set up.

a.xsl:

    <xsl:param name="a-is-master" as="xs:boolean" static="true" select="true()"/>
    <xsl:import href="b.xsl" use-when="$a-is-master"/> 

b.xsl:

    <xsl:param name="a-is-master" as="xs:boolean" static="true" select="false()"/> 
    <xsl:import href="a.xsl" use-when="not($a-is-master)"/> 
MarkNicholls commented 1 week ago

@Arithmeticus

I'll have to think about this one, the C idiom is obvious, you define something in the file and if then if its defined you don't include it.

Given that you have this trick, I presume that you also struggle with the original issue? and if a simple 'only include the 1st include' was implemented this would solve your issue and render the trick unnecessary?

dnovatchev commented 1 week ago

@MarkNicholls I have never had this problem, because I use xsl:import exclusively and avoid using xsl:include.

Not sure if I have ever had circular imports, but in my code (even with XSLT 1.0 - where there is no use-when-attribute capability) often different stylesheet modules, some of which import some of the rest, import the same additional stylesheet module.

For "circular imports", indeed a good solution is to place the repeated imports only in a separate, 3rd (top-)stylesheet module. So, if module A needs to import Module B and Module B needs to import Module A, then remove the imports from these two modules, and import both modules A, and B, in the top-module.

"import precedence", as any kind of "precedence" is used for resolving potential conflicts. If the code is written in a way such that there are no conflicts, import precedence will not be needed.

I think this could be a solution to the "circular imports" problem.

michaelhkay commented 1 week ago

At one level it's great to get feedback from users who are using XSLT successfully even though they don't understand basic concepts like "import precedence" - most of our users are in that camp and we need to understand them better. On the other hand, it's hard to have a conversation about changing the details of the language semantics with people who don't have a solid grasp of the current specification.

michaelhkay commented 1 week ago

@dnovatchev I think one of the problems is that the intuitive understanding of xsl:import and xsl:include is to simply declare the dependency of one module on components defined in another. But that's too simplistic a model.

@dnovatchev If you represent all your cross-module dependencies with xsl:include then you get errors due to duplicate declarations and circular dependencies. If you use xsl:import instead, you'll get rid of the duplicate component errors, but you'll still get circularities, and you can end up with a lot of problems with unintended overrides, eg. if the same global variable is declared in multiple modules.

Personally I use xsl:include by default, only using xsl:import if I know I want to override some of the imported components. And I will tend to put all the includes in the top-level module.

dnovatchev commented 1 week ago

@dnovatchev If you represent all your cross-module dependencies with xsl:include then you get errors due to duplicate declarations and circular dependencies. If you use xsl:import instead, you'll get rid of the duplicate component errors, but you'll still get circularities, and you can end up with a lot of problems with unintended overrides, eg. if the same global variable is declared in multiple modules.

@michaelhkay,

Yes, by "writing code without conflicts" I meant that there shouldn't be naming conflicts between any two modules.

As an example, the complete FXSL library (almost 200 modules) consists of modules that only import (do not include) one-another. As can be seen in the screenshot of the "Template Navigator" for testFunc-curry.xsl, some stylesheet modules, such as func-type.xsl, func-curry.xsl, func-apply.xsl - are imported more than once by different other stylesheet modules.

image

Arithmeticus commented 1 week ago

@MarkNicholls No, I don't struggle with XSLT's include/import functionality. It's fantastic, and any warning or error messages constantly prod me to rethink file structures. They invariably nudge me into better ways to organize or refactor my files.

For example, I might realize that maybe.xsl and list.xsl are so inextricably linked, they belong together in the same file. Or they belong together but within a function library, in which case I unite them with a top level XSLT file that simply includes the two of them (let's call that a function library binder). I then have a setup where I can add other functions as needed, and they have access to each other in the library. Then for any later XSLT applications that need maybe, list, or both, you just xsl:include the function library binder, and you get access to everything.

Arithmeticus commented 1 week ago

Also, you're right to call my approach a trick -- it's very subtle. I usually look for a way to shift such structures to a more traditional model. I can think of only one serious application where I let the trick stand, and there were special reasons to do so (it involved Schematron, a large XSLT application invoked by the Schematron file, and SchXslt output that needed to be incorporated back into the XSLT application it had invoked).

michaelhkay commented 1 week ago

some stylesheet modules, such as func-type.xsl, func-curry.xsl, func-apply.xsl - are imported more than once by different other stylesheet modules

This works, but it's not something I would generally recommend.

If the modules only contain named components such as functions, global variables, and named templates, then you're covered by the rule that all but the highest-precedence components are effectively ignored.

If the modules contain template rules, then your (assembled) stylesheet contains multiple copies of the same template rule at different import precedences, which can invoke each other using xsl:next-match - which is terribly confusing both for the stylesheet author and for the XSLT processor!

MarkNicholls commented 1 week ago

@Arithmeticus

I think the disconnect here is I'm trying to write libraries,

(putting precedence to one side)

Its quite common in programming languages to disallow cycles in compiled components, I suspect because as the whole components need to be loaded at the same time for it to be resolved, then the reasoning is that they should be defined in one place.

I can probably accept that removing circular references is probably not an unreasonably constraint.

But its not common for programming languages to choke on repeated references to components, and this constraint pretty much precludes a developer from writing stand alone function libraries with shared references.

michaelhkay commented 1 week ago

I've read the 3.0 spec and I don't understand the notion of "the same import precedence", import precedence seems to be by definition a total order

If module A imports module B, then A has higher import precedence than B, which means that any named component (such as a function, variable, or named template) in A overrides any component in B with the same name. If A and B include template rules, then the rules in A are chosen in preference to those in B.

If module A includes module B, then A and B have the same import precedence, which means that it's an error to have named components in A with the same name as a component in B. If A and B include template rules, then they have the same precedence, which means that ambiguities are resolved by the concept of "declaration order".

MarkNicholls commented 1 week ago

@michaelhkay

I apologise for my ignorance, I think its a valid discussion, with valid use cases, but because these constructs don't fit my context, I don't use them, so I can't learn by using, and if the spec isn't especially clear, then its analogous to trying to teach a 5 year old their multiplication tables by referring them to the axioms of ZFC, and (harshly I think) claiming they are basic.

I'm sure its frustrating, and this shouldn't be a forum for training the ignorant or lazy, but in many of these conversations there is a bit of mutual enlightenment (there are some things I've referred to and thought quite foundational, and been surprised that others haven't been familiar with).

I understand the explanation above, thank you, revisiting the spec, there is a subtlety where the include section refers to the import precedence, and for the ignorant it is easy to conflate include with import.

OK, so I assume your PR is just (effectively) related to include? This seems to fix my scenarios I think, I would use include, I am not concerned with precedence, I'm obviously too ignorant to comment about the nuances of this though.

P.S. I'm not using 'ignorant' as an insult, simply as an adjective.

MarkNicholls commented 1 week ago

@MarkNicholls I have never had this problem, because I use xsl:import exclusively and avoid using xsl:include.

Not sure if I have ever had circular imports, but in my code (even with XSLT 1.0 - where there is no use-when-attribute capability) often different stylesheet modules, some of which import some of the rest, import the same additional stylesheet module.

For "circular imports", indeed a good solution is to place the repeated imports only in a separate, 3rd (top-)stylesheet module. So, if module A needs to import Module B and Module B needs to import Module A, then remove the imports from these two modules, and import both modules A, and B, in the top-module.

"import precedence", as any kind of "precedence" is used for resolving potential conflicts. If the code is written in a way such that there are no conflicts, import precedence will not be needed.

I think this could be a solution to the "circular imports" problem.

@dnovatchev

I have tried import, but tripped up on precedence issues (I think explained by MK).

The separate xslt file with the dependencies, is the solution given by MK to me on SO, and it does indeed work, but this file is a construct of the closure of all dependencies defined (implicitly) by some specific root stylesheet, so you can't write logically closed stand alone libraries and use them, without then having to construct this file and a entry point stylesheet to include both the the dependency file and then all the dependent xslt files.

Its the solution I currently use, but its not 'modular' i.e. I cant use modules in isolation without bespoke 'coding', the PR MK has suggested means (I THINK) I simply include the dependencies in each module, and i can use them as standalone modules or combine them in new modules as I choose.

dnovatchev commented 1 week ago

some stylesheet modules, such as func-type.xsl, func-curry.xsl, func-apply.xsl - are imported more than once by different other stylesheet modules

This works, but it's not something I would generally recommend.

If the modules only contain named components such as functions, global variables, and named templates, then you're covered by the rule that all but the highest-precedence components are effectively ignored.

Yes, this is exactly what happens, FXSL is a library of functions. I don't think there are any global-level variables at all.

When one writes a function library to be used by other people, it would be very often the case that users would need different combinations of stylesheet modules of the library. If these were using <xsl:include> directives, this would most often result in errors when some stylesheet module is included more than once.

If the modules contain template rules, then your (assembled) stylesheet contains multiple copies of the same template rule at different import precedences, which can invoke each other using xsl:next-match - which is terribly confusing both for the stylesheet author and for the XSLT processor!

In FXSL the only templates that are used have the goal to match to a unique node that identifies a particular function, so that passing this node around as a parameter is equivalent to passing the function it corresponds to as a parameter.

Thus every template matches a unique node and there are no two different templates that match the same node. Obviously, no xsl:next-match is used anywhere.

dnovatchev commented 1 week ago

@dnovatchev

I have tried import, but tripped up on precedence issues (I think explained by MK).

@MarkNicholls,

Please see this comment where I explain how any such issues are completely avoided in the FXSL library: https://github.com/qt4cg/qtspecs/issues/1449#issuecomment-2358931557

One should not have any problems, by following the same design guidelines.

If you need more information on this, please, DM me in the xml.com chat, or just send me an email.