xproc / 3.0-steps

Repository for change requests to the standard step library and for official extension steps
10 stars 7 forks source link

What is the default base URI for documents created by the standard steps? #308

Closed ndw closed 4 years ago

ndw commented 4 years ago

In the course of examining the consequences of an archive (passed to p:archive) having no base URI, Achim and I have discovered an incompatibility in our implementations.

My implementation makes the base URI of documents created by steps the same as the step unless there's some overriding value. For example, the documents created by p:count, p:compare, and p:archive (in the case of creating a new archive) all use the base URI of the step as their base URI.

I think there are several reasons why this is a good idea:

  1. It’s inconvenient for pipeline authors (especially now that we’re more strict about it) for documents to have no base URI. I don’t think we should be generating them frequently.
  2. If we say that the base URI of a document generated by a step is the base URI of the step that generated it (unless there’s some overriding circumstance), then authors can use xml:base on the step to set the base URI. Mostly, of course, the base URI doesn’t matter so author’s don’t have to do this.
  3. I think it’s consistent with what users expect. It’s the way XSLT works, for example. If you don’t specify an alternate base URI on a bit of generated content, it gets the base URI of the template that created it.
  4. That’s the way I implemented all the steps in my 1.0 implementation! :-)

Achim, quite reasonably I think, took the position that the documents produced by those steps have no base URI.

We must clarify this. I assert that this is a step spec issue (not a language spec issue) because I think it's a question about the behavior of the standard steps. Someone writing their own steps might choose to take a different approach.

We should try to resolve this quickly as it's a lot of work for one of us to change our implementation.

xml-project commented 4 years ago

I do not think this is a step issue, but must be stated in the core specs. If we want this behaviour, then it must be the same for EVERY atomic step a processor know, i.e. not only those specified by the step specs, but also by processor defined steps or by steps defined in third party package.

xml-project commented 4 years ago

My main argument against @ndw's proposal: It is to late now. It is not described neither in the core specs nor in the step specs. I think we should try to come to an end and not invent new features all the time.

xatapult commented 4 years ago

As user I think (hmm, not sure) I would expect that a document generated by some step where there is no clear ancestor before the step (like p:count etc.) has no base-uri. Because it appeared out of "nothing".

If we must give it a base-uri for some reason, then it should be the base-uri of the step. But again, I think it should have none. This preference is not very strong and I have a feeling I don't oversee all consequences, pros and cons. So, I would not stand in the way of a solution where the base-uri of the step was used.

About in the core spec or not: Why should it be in the core spec? We could record it as a preferable behavior, but I don't see any problem in custom steps taking different directions here.

gimsieke commented 4 years ago

Let’s define what documents created by steps are, whether there is a difference between documents modified by steps and documents springing into existence from a step. Technically, there is no document identity so even p:add-attribute creates new documents. My expectation is that the result document of p:add-attribute has the same base URI as the source document. I think this has to be the case because we say that all document properties are preserved and the base-uri property is the same as / synchronized with the base URI. Then there is p:xslt, of which I just discovered that no document properties are preserved. This is different from XProc 1.0 where the non-sequence result document inherited the base URI from the first source. So I think the question is limited to the steps that don’t preserve document properties, in particular the base-uri property. (I’m not arguing for or against anything up to here; I’m just trying to further contain the problem space.)

Then I wonder what you mean with “a step’s base URI”. I think you mean the base URI of the pipeline document that uses the step in question, as opposed to a document that the step was declared in. The latter is unavailable for processor-implemented steps, or there will only be p:library documents with placeholder declarations. Therefore I think you are talking of the pipeline document that happens to use p:count, p:archive, etc. This will be most likely this pipeline document’s static base URI.

As in the case of whether manipulating the base-uri property may be used to manipulate a document’s base URI: In my view it was not a new feature but a clarification of something that the spec left a bit unclear.

With the current question, we should at least provide a bit more of explicit clarity. At least we should say that in cases where the document properties are preserved, so is the base URI. (I think you can only reasonably speak of property preservation if a step has a single input port that is primary and a single output port that is primary.) And in cases where the base-uri property isn’t preserved, we should at least say that it is implementation-defined whether the document has a base URI and which it is.

This is of course inconvenient for pipeline authors as it may limit their pipeline’s portability.

In practice, it probably won’t be much hassle since it will rarely be an issue that certain documents don’t have a base URI. You wouldn’t do anything with a p:count result’s base URI, and you would supply an explicit storage location URI if you want to store or unarchive an archive. If the archive had as base URI the static base URI of the pipeline that created it, you wouldn’t be able to do anything useful with that URI.

ndw commented 4 years ago

By my reading, that's two votes for "shouldn't have a base URI" and one observation that not having a base URI would rarely be a problem. I think the fact that we've come this far without noticing that our implementations differ on this point supports the assertion that it'll rarely be a problem.

I propose that my implementation is in error and steps that say "no properties are preserved" should have no base URI.

I'm not even sure that any spec changes are necessary.

ndw commented 4 years ago

Closed by #314