xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

Conflicting merge description for serialization? #640

Open xatapult opened 5 days ago

xatapult commented 5 days ago

The core spec says about serialization parameters here

If a step serializes a document whose document properties contain a serialization property, it must use these serialization properties. If the step itself allows specification of serialization properties (usually by a serialization option), both sets of serialization properties are merged. Serialization properties specified on the step itself have precedence over serialization properties specified with the serialization document property.

If I interpret this correctly: serialization properties set in a step's option (for instance p:store, option serialization) have precedence over a document-property serialization.

However, if I look at the p:store here, it says:

The serialization option is provided to control the serialization of content when it is stored. If the document to be stored has a “serialization” property, the serialization is controlled by the merger of the two maps where the entries in the “serialization” property take precedence.

That's the other way around... isn't it? Or am I reading things wrong?

Anyway: if we mixed this up, I think the way the core spec describes it makes more sense: when you explicitly set serialization on a step (or anywhere in the XProc code), that IMHO should take precedence.

ndw commented 5 days ago

Ah, that probably explains my confusion the other day.

It's a shame we were inconsistent about this. We're going to have to fix it, I fear, backwards incompatiblity or no.

I think an argument can be made for either set of properties having priority:

  1. I put the properties on this step, use them.
  2. I put the properties on this document because this document is special, use them

Which was does consensus blow?

xatapult commented 4 days ago

I'm writing code now that has to store documents produced by XSLTs that set <xsl:output> serialization properties. There is no guarantee that the stylesheet writers made the right decisions on this, usually it is just copied from somewhere without thinking. It is not of interest to them, except for when they are running the stylesheet outside the XProc context, for development or debugging. And those serialization settings do not have to be the same as the desired ones for production.

With this use case in mind, I would argue that if you explicitly set serialization properties in your XProc code, these should have precedence.

xatapult commented 4 days ago

I'm also pretty sure that the consensus was to give the step serialization properties precedence.

Because: I have now code that worked fine until a recent version change of Morgana, when Achim implemented the precedence of document-properties. I traced my problems back to this change. So somewhere along the way we decided to do things different than we used to with respect to the serialization. I don't know where it crept in, but I hope it's not too late to revert...

ndw commented 4 days ago

Well, if there's evidence that it was "steps get priority" probably until I kicked the hornet's nest poking around in the validate with DTD tests, I'm happy to go with that.

xatapult commented 4 days ago

Curious what the others think or remember about this... @gimsieke , @xml-project Any comments?

gimsieke commented 3 days ago

I neither have a recollection nor a preference. Also I’m between meetings at Frankfurt Book Fair.

xml-project commented 3 days ago

I remember we talked about this, but I cannot remember the result. I tend to agree to unify the behaviour, however this will be a pain to pipeline authors. So I would prefer the method with the smallest number of changes to XProc and therefore to existing pipelines.

Apart from this, I have exactly the opposite intuition as Erik. My reasoning goes this way: Having a serialization property on a document is something I almost never do in my pipelines. I rely on the step's option to be the general case. That said means, a serialization property on the document is the special case, indicating that this document has to be handled special. And then I would want the property on the document to take precedence over the general case rulings on the step.

I do not think this is a crucial argument, because I could get the described behavior with just a p:choose to handle the general and the special case. This might be the better technique anyways to make the pipeline more readable.

Does this make sense to anyone?

ndw commented 3 days ago

Yes, it makes sense to me. I can very definitely see both sides of this. From a practical perspective:

  1. There should be one answer: step wins or document wins.
  2. Since there doesn't seem to be a compelling, logical reason to choose one case over the other, I suggest we should make the change that has the potential to break the minimum number of existing pipelines.

What does your implementation do, @xml-project and in which cases?

xatapult commented 3 days ago

Morgana now implements the standard correctly (as it is described now): document properties first.

However, I still think this is a mistake, because of a very common use-case:

xml-project commented 3 days ago

@ndw Will investigate this.

xatapult commented 3 days ago

Anyway, if you do need specific serialization properties, you can always set them:

<p:set-properties properties="map{'serialization': map{...} }"/>
<p:store href="..."/>

This is what I do now in my code (had to, the stylesheets are not under my control, the serialization was prescribed).

You could of course also strip off the serialization document-property after an p:xslt step. However, that is not easy, there is no p:remove-properties step...