gruninger / Common-Logic

Documents for the developments of ISO 24707 Editiion 2 (Common Logic)
8 stars 3 forks source link

Namespaces/Prefixing #23

Open greenTara opened 12 years ago

greenTara commented 12 years ago

The CL standard does not introduce any means to abbreviate long names, which frequently appear as identifiers (e.g. IRIs on the Web). A commonly-used approach for such abbreviation is to define prefixes, such as "pre" and to write a corresponding "prefixed name" as "pre:local". The expansion of the prefixed name to an unprefixed name typically involves concatenating the text associated with the prefix, often called the "namespace" with the local part of the name, sometimes with a particular character inserted in between.

a) Should the CL revision mention such namespace/prefixing mechanisms in the abstract syntax? b) Should prefixing be introduced into any of the concrete dialects? c) If so, what form should the namespace/prefixing mechanism take?

greenTara commented 12 years ago

I don't believe it makes sense to have a "CL" prefixing mechanism, but instead a CLIF prefixing mechanism, an XCL prefixing mechanism etc.. The definition of a dialect is very general, allowing for names which are not necessarily character sequences - they could for example be images.

RDFa 1.1 Core (http://www.w3.org/TR/rdfa-syntax/#compact-uri-expressions) has addressed the problem of unambiguously abbreviating IRIs, and an a variant of their solution is what I would recommend for XCL. In particular, I would suggest

  1. having a separate syntax for names that are and are not identifiers, so one should never be in doubt as to which is intended.
  2. in the case of a name which is an identifier, CURIEs and IRI references (both absolute IRIs and relative IRI references) are accepted as lexical values. In the case when a lexical value could be either a CURIE or an IRI, it is assumed to be a CURIE. This is the same datatype as the RDFa 1.1 @about attribute, except SafeCURIEs are not included. SafeCURIEs are only included in RDFa 1.1 for backward compatibility.
  3. All CURIEs must have their prefixes explicitly declared within the document. This could be done using the existing XML namespace capability
<XCL xmlns:a = "http://example.org/a" xmlns = "http://example.org/">...

but this is deprecated in RDFa 1.1, so an attribute @prefix would be a more consistent way to define prefixes.

  1. Base IRIs should be explicitly declared in the text, as this is more robust than determining the base IRI from the retrieval URL or other implementation-dependent means. This would be done using the existing xml:base attribute. xml:base is allowed on any element, so the base IRI could vary from one place in the text to another. This allows text from other URLs to be included, with mechanisms like XInclude (http://www.w3.org/TR/xinclude/).
clange commented 11 years ago

I agree with Tara that it doesn't make sense to specify this in the abstract syntax. However, in practice, tools may have to treat prefix maps as if they were part of the abstract syntax: When exchanging ontologies with other people/tools that use different dialects, one does not always want to lose prefixes and end up with full IRIs, or with artificially generated prefixes. (This is exactly the same situation in OWL.)

Indeed RDFa 1.1 Core has come up with the best specification for this so far. In DOL we are also making use of these CURIEs. All that is left to the language designer is the mechanism for assigning the prefixes. DOL proves that this mechanism not only works in XML languages but also in languages with a free-form text syntax. And, BTW, the OWL 2 Manchester syntax and the Turtle serialization for RDF, which use very similar mechanisms (just not called CURIEs for historical reasons) also prove it.

  1. Could you please explain to me what is "a name that is not an identifier"? Do you mean, e.g., the name of a bound variable?
  2. Indeed this works reasonably well in RDFa. On the other hand, CL aims to be more general, e.g. in that it speaks of "network conformance" and not specifically of the URI/IRI-based WWW. One pitfall with "CURIEs vs. IRIs" is that you can assign a prefix http:, in which case any such string would be interpreted as a CURIE. (Well, this is actually a problem that exists within the WWW network.) If we consider this risky, we might want to have separate syntaxes, e.g. <...> for full IRIs.
  3. Yes, I'd recommend @prefix over xmlns – also because CURIE prefixes are meaningless to an XML parser (which is the reason why RDFa introduced it). For CLIF we need to introduce some new syntax. CLIF absolutely needs this mechanism as well, as CLIF aims at easy human reading/writing, which means that identifiers shouldn't be too long. We have two choices:
    1. RDFa style: one long string. (cl-prefix "foo: http://foo.net/bla# bar: http://bar.net/xx#")
    2. more structure, e.g. with one S-expression per prefix: (cl-prefix foo http://foo.net/bla#) (cl-prefix ...) – And we also need to agree on the scoping. I think that one such prefix map per CL text is sufficient, but RDFa allows it everywhere, including inner prefix maps overriding outer ones.
  4. If we like a base IRI declaration mechanism, this will also need CLIF syntax; I'd suggest (cl-base http://...).

In this context we should also say what bare-word names (e.g. foo) are interpreted as. These are, after all, the most frequently used identifiers in CL so far. We could agree to admit the "no prefix" case of CURIEs, i.e. that one can bind a prefix to be prepended to such identifiers. Note that CURIEs also provide for the "empty prefix" case (i.e. :foo), but it is usually not advisable to enable both. RDFa only uses "empty prefix", OWL Manchester and DOL only use "no prefix". OK, with "no prefix" enabled, foo is an expandable CURIE – if the "no prefix" is bound to a namespace IRI.

We could also say that bare-word names are a special case of relative IRIs (relative to the base).

tillmo commented 11 years ago

Regarding Christoph's question, "Could you please explain to me what is "a name that is not an identifier"? Do you mean, e.g., the name of a bound variable?":

An identifier is a name that is used for the identification of a CL text on a network. Certain names (like numerals) are forbidden as identifiers in CLIF. See the ISO 24707 document, p. 16.

Having a separate syntax for names that are and are not identifiers implies that CURIEs with empty prefix cannot be written as "name", but must be e.g. written as ":name" (as in RDFa).

By the way, the current ISO 24707 document says on p. 13: "There is no notion of ‘bound variable’ in the CL abstract syntax. " and on p.24: "KIF distinguishes variables from names, and requires quantifiers to bind only variables: CLIF does not make the distinction."

clange commented 11 years ago

Till, thanks for pointing me to the right places of the CL standard. Regarding your comment that

a name that is used for the identification of a CL text on a network

you probably didn't (?) mean texts in the narrow sense, as we are, here, also talking about abbreviating names that identify, e.g., predicates (e.g. (foo:Person bar:Till)). At least I thought so.

I don't understand the design rationale for saying that names can be numerals (instead of interpreting numerals as numbers), and that the rest is up to the dialects. (And then, if CLIF does interpret them as numbers (does it?), is there any other relevant dialect that doesn't/shouldn't?) Neither do I understand the reason for not giving bound variables a special treatment. – But this issue is not the right place to discuss these considerations, and I am not the right person to discuss them anyway.

Now let me focus on catching up with that private e-mail thread on this topic. As I said by a final private e-mail I'm sorry that I didn't take action earlier. @Tara, I'm not aware of any ownership/assignment of Github issues (but hope Michael will appoint further collaborators :-)), so it is actually good that you started the discussion during my inactivity. I think this is the right place for continuing the discussion (unless you disagree).

Tara wrote in an earlier mail

I would still recommend introducing a separate syntax for identifiers and non-identifier names. A popular one is to enclose an identifier in angle brackets.

Taking into consideration what I quoted from OWL (both Manchester and functional-style syntax) and RDF (Turtle) above, this may end up to be very confusing. OWL and Turtle use angle brackets for full IRIs as opposed to abbreviated (CURIE-like) IRIs. So we shouldn't use angle brackets for a different distinction purpose. Let's use something else; for now let me call it [...], and with that let me comment on the CURIE/IRI cases mentioned by Tara:

[:a] would be a CURIE if the empty prefix is defined, otherwise it is an IRI

OK. Indeed this is the thing that the CURIE spec calls "empty prefix". Otherwise let's be precise: the other case (without colon) is a separate case of CURIE, called "no prefix".

[a] would be a relative IRI reference and the base IRI would have to be defined for the text to be valid,

OK.

[#a] would be a same-document reference and the base IRI would have to be defined for the text to be valid,

OK; note that this is just a special case of relative IRI.

[a:] would be a CURIE if the prefix a is defined, otherwise it would be an absolute IRI

OK, and of course we'd implicitly discourage introducing a prefix like http:.

greenTara commented 11 years ago

Re: we are, here, also talking about abbreviating names that identify, e.g., predicates (e.g. (foo:Person bar:Till)). At least I thought so.

That is the current situation in the CL standard, and also, as I'm sure you well know, convention on the Web. However in today's teleconference, a proposal was made to limit identifiers (in the scheme http ?) to only represent information resources, thus wading right into the middle of httpRange-14.

Re: (And then, if CLIF does interpret them as numbers (does it?), is there any other relevant dialect that doesn't/shouldn't?)

In XCL, we may have something like <Data>3</Data> for numbers, allowing us to attach datatypes as an attribute <Data xsi:type="xs:float">0.1</Data> although nothing is decided at this point. We tend to keep data highly separate from names and identifiers.

Re relative IRI reference or no-prefix CURIES - as long as there is a syntactical distinction between identifiers and names, then I'm OK with one of these (but not both), although there are usability questions as to how easy is it to keep track of how the various abbreviation methods should be expanded.

Regarding identifer syntax, I think [...] was used for safeCURIEs, so that may not be an optimal choice either.

I agree that we should be precise in our terminology. In particular, I am as guilty as most in talking about "relative IRIs" when actually it should be "relative IRI references". IRIs are a subset of IRI references, and are always absolute. A CURIE is not an IRI reference at all. It's a nightmare, but we should stick to the accepted W3C definitions.

clange commented 11 years ago

However in today's teleconference, a proposal was made to limit identifiers (in the scheme http ?) to only represent information resources, thus wading right into the middle of httpRange-14.

Thanks for pointing out, that wasn't so obvious from the chat transcript, which I just finished reading. Is the reasoning for this that the use of HTTP IRIs/URLs for non-information resources is actually a bad practice or a bug (for which the practice of 303 redirects is just a dirty workaround) – and that we, while we have the freedom of design, should not reproduce this bug? (I'm superficially aware that the WWW folks are not quite happy with the 303 settlement of httpRange-14 either, e.g. http://www.w3.org/wiki/HTML/ChangeProposal25, but I haven't been following these in detail, and I'm not sure whether that's relevant here.)

Regarding identifer syntax, I think [...] was used for safeCURIEs, so that may not be an optimal choice either.

Sorry, it was just something I made up. Let's say {...} or whatever ;-)

I am as guilty as most in talking about "relative IRIs" when actually it should be "relative IRI references"

The same holds for me, sorry :-/

greenTara commented 11 years ago

The details of 303 redirects etc did not come up, but yes it was mentioned that we have the freedom to not reproduce the situation. We also have the freedom to make CL an academic exercise that does not interoperate with any practical implementation. So it goes.

tillmo commented 11 years ago

Am 27.11.2012 22:02, schrieb Christoph Lange:

Sorry, it was just something I made up. Let's say |{...}| or
whatever ;-)

why can't we just recognize foo:bar as CURIE? I.e. anything with a : would be a CURIE? I think this is in accordance with what is said in https://github.com/gruninger/Common-Logic/issues/12

Best, Till

Prof. Dr. Till Mossakowski Cartesium, room 2.51 Phone +49-421-218-64226 DFKI GmbH Bremen Fax +49-421-218-9864226 Cyber-Physical Systems Till.Mossakowski@dfki.de Enrique-Schmidt-Str. 5, D-28359 Bremen http://www.informatik.uni-bremen.de/~till/

Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH principal office, not the address for mail etc.!!!: Trippstadter Str. 122, D-67663 Kaiserslautern management board: Prof. Wolfgang Wahlster (chair), Dr. Walter Olthoff supervisory board: Prof. Hans A. Aukes (chair) Amtsgericht Kaiserslautern, HRB 2313

greenTara commented 11 years ago

To be a CURIE, there are at least two requirements:

  1. the name must match the CURIE syntax. For example, no spaces are allowed in a CURIE.
  2. the prefix must be defined. Otherwise, we would never be able to write an absolute IRI.

Issue #12 is more about usability - in either case some identifiers or some non-identifiers must be disallowed (in CLIF - this issue is not relevant to XCL) to avoid ambiguity with these reserved words.

If the CLIF developers want to have no explicit syntax for identifiers, that's up to them. I think the CL abstract syntax could be written so that both approaches are allowed, although it would be simpler if an explicit syntax distinguishing the two is required. XCL will definitely go with an explicit syntax for identifiers. For example, we may write <Name iri="foo:bar"/> and <Name>foo:bar</Name>

The first would be an identifier, the second would not. (This is just an example of a possible syntax.)

One of the unresolved issues is whether CLIF will use only the IRI system for identifiers, or will there be an allowance for other identifier systems (including evolution of the Web identifier system). Multiple identifier systems would be easier to manage if there was a separate syntax for identifiers. Here are two scenarios:

A. Separate syntax for identifiers and names. The general CLIF would allow (nearly) arbitrary strings for both identifiers and names (with appropriate escaping and disallowing reserved words). To specialize this CLIF to the web, simply restrict the identifier syntax to CURIE or absolute IRI (no relative IRI references). If there is a change in the syntax of web identifiers? Just change the restricted CLIF - general CLIF does not need to change. Identifiers are not allowed to be bound to quantifiers. Non-identifiers are not allowed to be used as names of texts, modules or module vocabularies or in importation statements. Otherwise, identifiers and non-identifiers are used freely as predicates, functions and terms.

B. Same syntax for identifiers and names, and if a name can be a CURIE or an IRI, it is an identifier, with preference given to CURIE. Both identifiers and non-identifiers are allowed to be bound to quantifiers, with a "cancelled quantification" semantics in the case it is an identifier. Both identifiers and non-identifiers are allowed as names of texts, modules and importation statements, with interpretation to false if it is a non-identifier.

Now suppose the W3C changes the identifier syntax slightly, say a new JRI syntax that includes all IRIs, for backward compatibility, plus a few more things.

So, scenario A is more robust than scenario B to backward compatible changes to the identifier system. However, both scenarios have difficulty with a modification to the identifier system that makes some identifiers invalid.

Based on this analysis, I would recommend in any case that a normative system for specifying the identifier system be included with the new commenting syntax.

In any case, we should also address the issue of re-naming when translating between XCL and CLIF - there should be a standard correspondence, for interoperability.