metanorma / metanorma-standoc

Metanorma for Standoc documents
BSD 2-Clause "Simplified" License
5 stars 2 forks source link

Reference entities with persistent URI instead of anchor #692

Closed opoudjis closed 1 year ago

opoudjis commented 2 years ago

To be used for requirements, but is generic, Linked Data requirement.

ronaldtse commented 2 years ago

Here's the original ticket that was prematurely closed: https://github.com/metanorma/metanorma-iso/issues/793

opoudjis commented 2 years ago

requirement:uri:[] has been suggested. This will NEVER be the implementation. However, <<http://example.com>> => <<id,uri=http://example.com%>> => <xref target="id">http://example.com</xref> might be better expressed as identifier:xref[http://www.example.com]. If we want to include hyperlinked text, however, we will end up back at <<http://example.com,text text text>> (assuming Asciidoctor does not mangle the URI in the anchor). So it's not clear that there is real benefit.

ronaldtse commented 2 years ago

Might be easier to read if these are in separate lines?

This is fine:

<<uri=https://standards.isotc211.org/19115/-1/1/conf/metadata-xml/basic/character-encoding>>
// or
<<text,uri=https://standards.isotc211.org/19115/-1/1/conf/metadata-xml/basic/character-encoding>>

Some requirements will not have the full URI:

<<uri=/conf/metadata-xml/basic/character-encoding>>
// or
<<text,uri=/conf/metadata-xml/basic/character-encoding>>

Remember that AsciiDoc supports the link: macro if we're looking at URI/URLs if you want to compare the syntax:

link:uri[https://standards.isotc211.org/19115/-1/1/conf/metadata-xml/basic/character-encoding]
// or
link:uri[https://standards.isotc211.org/19115/-1/1/conf/metadata-xml/basic/character-encoding]
opoudjis commented 2 years ago

The scenario I'm thinking about is when there is both a URI and a rendering text, but no anchor ID.

<<id,uri=URI%text>>

is how we would deal with both URI and text normally, following our convention of suffixing xref flags with %, to delimit them from any rendering text. However, if we have the URI, we don't actually need the id, we will look that up internally.

I'm increasingly coming to the conclusion that, if Asciidoctor does not mangle it, <<URI,text>> is adequate. If Asciidoctor does mangle it, then I'll have to go through other options.

ronaldtse commented 2 years ago

This is really messed up:

<<id,uri=URI%text>>

Notice that this syntax takes the first argument as a "locator", and the latter as "display text". The purpose of the uri is the the locator. When the URI is present, there is no need for id.

As for this:

<<URI,text>>

This may not work for these reasons:

  1. A requirement may be assigned a different anchor.
  2. XML will mess up the URI.
  3. Not all requirements will have an identifier as "URI". It may be some other "ID".
opoudjis commented 2 years ago

uri as an attribute is already defined for tables and admonitions, as the place where the table and admonition is available for download online.

`Not all requirements will have an identifier as "URI". It may be some other "ID".

Well, that suits me, I had originally proposed globalid all along.

I am trying to resolve the syntactic problems with markup here, and your "This is really messed up" is counterproductive. I EXPLICITLY said I was trying to avoid that syntax.

opoudjis commented 2 years ago

As for this:

<<URI,text>>

This may not work for these reasons:

  1. A requirement may be assigned a different anchor.
  2. XML will mess up the URI.
  3. Not all requirements will have an identifier as "URI". It may be some other "ID".

First: stop talking about requirements. This is meant to be a global solution for all blocks, as I have said on multiple occasions.

Second: as originally intended, the globalid is not going to be restricted to URIs. That dispenses with (3).

Third: Blocks will have both anchors and global identifiers. Therefore objection (1) is invalid. The intention is to search the document for globalid matches, if there is no match for an anchor with the xref supplied. In fact, I intended the format of the URI to let me switch to globalid; but we're dispensing with URI anyway.

Fourth: Whatever appears as the identifier in the Asciidoctor markup, the xref is still going to end up pointing to an anchor and not a URI. The semantics of what the xref is doing do not change depending on how its referent is identified. Therefore, objection (2) is invalid, the URI will never end up in the XML, and in any case, it will be presupposed as URI-encoded anyway in its attribute role.

I could default to using the global id text to render xrefs pointing to a global ID. I won't, because that's bogus, and preferencing one use case arbitrarily over another. If you know the global ID, you can certainly provide it as the display text as well, overriding the Metanorma default.

However, fifth: ASCIIDoctor does not permit <<http://www.example.com,text>>: it is over-eager to interpret links as links. However, the alternate syntax xref:http://www.example.com[text] does work.

So:

Requirements can be formatted as either:

[[id1]]
[globalid=http://www.example.com]
---
....
---

or as

[[id1]]
---
identifier:: http://www.example.com
---

both of which will be rendered as

<requirement id="id1" globalid="http%3A%2F%2Fwww.example.com">

In the latter case, requirement/identifier will be copied across to requirement/@globalid, and URI-encoded, in postprocessing.

A cross-reference using global identifiers will employ a Metanorma strategy globally applicable to any type of block, and not just requirements. If the following works,

xref:http://www.example.com[]

then the supplied anchor will be resolved against all globalids in the document, and resolved as

<xref target="id1">

(I'm already checking all xref targets against all bibliographic item anchors, to convert them into erefs; so this is just more of the same in postprocessing.)

I am not going to inject a globalidtarget into xref, because there is no reason to: that would be a major conceptual confusion, which I will not permit. And xref:http://www.example.com[], xref:id1[], and <<id1>> can and will render identically.

If you want it to render with the global ID as display, xref:http://www.example.com[http://www.example.com] is all you need do.

Kindly allow me to implement this before commenting further.

opoudjis commented 2 years ago

Currently, Standoc replaces unsafe characters in XML attributes with underscore: http://www.example.com => http___www.example.com

We will need to override that for any attribute encoding to XML of globalids, to a non-lossy representation that is compatible with xsd:IDREF (anchors).

xsd:IDREF is xsd:NCName: "letters, digits, ideographs, and the underscore, hyphen, and period".

The practical restrictions of NCName are that it cannot contain several symbol characters like :, @, $, %, &, /, +, ,, ;, whitespace characters or different parenthesis. Furthermore an NCName cannot begin with a number, dot or minus character although they can appear later in an NCName.

So we can't use Percent-Encoding, which is the obvious safe encoding of URIs: anchors don't contain %. The only solution I can see is Punycode: https://en.wikipedia.org/wiki/Punycode

... Fortunately, I don't think that will be necessary:

opoudjis commented 1 year ago

Updated:

The requirement is now that multiple anchor aliases for the same object be supported.

Putting multiple values inside of an attribute is fraught, and putting them in an element is a non-starter. These anchor aliases are best handled outside the document model, rather than disrupt the document model to accommodate them.

So: there shall be no @globalid. Instead, there shall be a table //misccontainer/table[@id = '_misccontainer_anchor_aliases'], which shall contain anchors within the document, //misccontainer/table[@id = '_misccontainer_anchor_aliases']/tbody/tr/th, and a set of equivalent identifiers, //misccontainer/table[@id = '_misccontainer_anchor_aliases']/tbody/tr/td.

This table can be populated by the user, by using the reserved identifier on a table; it can also be populated by postprocessing of Modspec requirements, with the anchor assigned to each requirement, and the requirement identifier as an anchor alias.

Any instance of xref pointing to an anchor alias has its target resolved to the equivalent anchor. However, under isodoc, the anchor may be resolved to the identifier, instead of the canonical rendering. For that reason,

So in the case of a requirement with @id = 'id1', and id1 aliased to http://www.example.com,

There is a separate, Modspec-specific requirement for truncating identifiers to remove prefixes, using an inherited URI base. That is the subject of a separate ticket.

You will be tempted to ask me to make @style = 'id' the default for anchor aliasing. That would be ill-advised, because in any other instance that anchor aliases are used, you would not want the anchor alias displayed.

opoudjis commented 1 year ago

Added complication:

In xrefs expressed in Asciidoctor as <<...>>, any equals signs in the replacement text are ignored.

In xrefs expressed in Asciidoctor as xref:[], any equals signs in the replacement text are extracted and parsed as attributes.

So I need to pass the style=id% from the macro back into the replacement text of the xref for processing.

opoudjis commented 1 year ago

This is now implemented.

In the context of a Modspec specification, without further intervention,

[[id1]]
[requirement,model=ogc]
====
[%metadata]
identifier:: http://www.opengis.net/spec/waterml/2.0/req/xsd-xml-rules
....
====

xref:http://www.opengis.net/spec/waterml/2.0/req/xsd-xml-rules[style=id]

will render the cross-reference correctly, with the URI content, but pointing to id1:

<xref target="id1" type="inline" style="id">http://www.opengis.net/spec/waterml/2.0/req/xsd-xml-rules</xref>

The requested manipulations of the prefix of the URI under Modspec will be left until https://github.com/metanorma/mn-requirements/issues/21

Reiterating:

Asciidoctor will NOT process <<http://www.opengis.net/spec/waterml/2.0/req/xsd-xml-rule>> correctly, because of how its macros processing is prioritised. The [style=id] argument is necessary, to differentiate rendering of cross-references as identifiers from normal Metanorma rendering of cross-references.