gruninger / Common-Logic

Documents for the developments of ISO 24707 Editiion 2 (Common Logic)
8 stars 3 forks source link

CL Identifiers #37

Open greenTara opened 11 years ago

greenTara commented 11 years ago

References in Document: pp.8, 9, 11, 13, 16, Sections 6.1.1.1, 4, 5, 7, 6.1.2, 6.2, 6.3.1

Nature of defect:

Summary: The abstract syntax does not address the case where the syntax has a method for distinguishing identifiers and syntactically constraining these names from occurring bound in a quantified sentence. Further, the standard does not specifically state that all dialects must disallow identifiers being bound in a quantified sentence.

Discussion: A CL dialect that supports named texts, modules and importations, as is required for full conformance, must have rigid identifiers that are used for the names of texts, modules, vocabularies, and in importations.

Recall from the CL Standard, rigid identifiers (if there are any) are always names, but not all names are rigid identifiers. In several places, the Standard also uses "identifier", without the qualifier "rigid", to mean "rigid identifiers". Thus, in CL all identifiers are rigid identifiers, and from this point on I'll use "identifier" to mean rigid identifier.

Identifiers should not be used as freely as other names. In particular, it is not appropriate for an identifier to occur bound in a quantified sentence [1].

Aside from quantifier bindings, an identifier may be used anywhere other names are used (predicate, operator, term, exclusion set), and its denotation is a member of the domain of discourse unless excluded within a module (or used as a non-discourse name in a segregated dialect.) Identifiers as terms, operators and predicates are especially useful if vocabularies are derived from imported OWL ontologies or RDF graphs, which use IRIs for the names of classes, properties and individuals.

Further, extensions may use identifiers in other ways, e.g. to label and import or clone fragments of texts, such as sentences or terms, and to reference externally-defined datatype definitions, external annotations, "built-in" functions, and so on.

The reason for the restriction against identifiers occurring bound in a quantified sentence can be demonstrated with CLIF. CLIF does not distinguish syntactically between identifiers and other interpretable names, so the following text is syntactically legal according to the grammar.

(cl-text "http://xyz.org" (= a b)) (forall ("http://xyz.org") (P "http://xyz.org"))

The semantics (say for an interpretation I) takes an unexpected turn with this text however, because the quantification requires us to consider any interpretation J where the denotation of "http://xyz.org" "might" differ from that of I. However, all available (either CL or CLIF) interpretations give the same meaning to "http://xyz.org" because it is an identifier, so the text is true exactly when

The only way around this for CLIF is to simply declare that identifiers must not be bound in a quantified sentence, even though the grammar allows it. For this to be possible, fixing the subset of names which are identifiers must be a part of the specification of the vocabulary. However, this is not currently stated in the CL abstract semantics.

Other dialects may choose to take a different approach (from CLIF) on identifiers. Although there has not been much enthusiasm for complicating the abstract syntax by specifying details about particular identifier systems, such as IRIs, there is no reason a particular dialect could not include such details. In particular, XML has built-in capabilities for handling IRIs syntactically, and it is reasonable, if not expected, that an XML-based Common Logic dialect should incorporate both the lexical restrictions of IRIs for identifiers, and syntactic constraints that prevent identifiers occurring bound in a quantified sentence.

The abstract syntax and semantics could be modified to allow dialects that make a syntactic distinction between identifiers and other names to be considered exactly semantically conformant, without requiring this for all dialects.

Solution proposed by the submitter:

Original: (6.1.1.1) A piece of text shall optionally be identified by a name.

Proposal (6.1.1.1) A piece of text shall optionally be identified by an identifier. All identifiers are names.

Original (6.1.1.4) A module consists of a name, an optional set of names called the exclusion set, ...

Proposal (6.1.1.4) A module consists of an identifier, an optional set of names called the exclusion set, ...

Original (6.1.1.5) An importation contains a name ...

Proposal (6.1.1.5) An importation contains an identifier ...

Original (6.1.1.7) A quantified sentence has (i) a type, called a quantifier, (ii) a finite, nonrepeating sequence of names and sequence markers called the binding sequence, each element of which is called a binding of the quantified sentence, and (iii) a sentence called the body of the quantified sentence.

Proposal (6.1.1.7) A quantified sentence has (i) a type, called a quantifier, (ii) a finite, nonrepeating sequence of names and sequence markers called the binding sequence, each element of which is called a binding of the quantified sentence, and (iii) a sentence called the body of the quantified sentence. A binding must not be an identifier.

Fig. 4 p. 11 (UML diagram) p. 11. The class box labeled "Name" should be labeled instead as "Non-Identifier Name".

Section 6.2, p. 13 Original: The vocabulary of a Common Logic text is the set of names and sequence markers which occur in the text. In a segregated dialect, the names in vocabularies are partitioned into discourse names and non-discourse names.

Proposal: The vocabulary of a Common Logic text is the set of names and sequence markers which occur in the text. In all dialects, the names are partitioned into identifier and non-identifier names. In a segregated dialect, the names in vocabularies are also partitioned into discourse names and non-discourse names; identifiers may be either discourse or non-discourse.

Section 6.2, p. 14 Original: An interpretation J of V is an S-variant of I if it is exactly like I except that intJ and seqJ might differ with intI and seqI on what they assign to the members of S.

Proposal: An interpretation J of V is an S-variant of I if it is exactly like I except that intJ and seqJ might differ with intI and seqI on what they assign to the non-identifier members of S.

Section 6.3 p. 16 Original This section applies only to dialects which support importations and/or named texts.

Proposal This section applies only to dialects which support importations and/or named texts and/or modules.

(Insertion between para. 2 and 3 in Section 6.3.1, p. 16) Particular dialects may, but are not required to, introduce means to designate which names are identifiers, e.g. through naming conventions, punctuation or declaration. All dialects must disallow identifiers occurring bound in a quantified sentence; however, a Common Logic dialect is not required to provide sufficient syntactic constraints to guarantee that identifiers are not bound in a quantified sentence in any syntactically legal text of the dialect.

Related Issues:

Background:

The CL standard mentions identifiers and identification in several places.

p.8 6.1.1.1 ... A piece of text shall optionally be identified by a name.

6.1.1.5 An importation contains a name. The intention is that the name identifies a piece of Common Logic content represented externally to the text, and the importation re-asserts that content in the text. The notion of identification is discussed more fully in clause 6.3.1 below.

UML diagrams: Figures 1, 3

p. 12 Dialects intended for transmission of content on a network should not impose arbitrary or unnecessary restrictions on the form of names, and shall provide for certain names to be used as identifiers of Common Logic texts; that is, character strings used as identifiers in a dialect shall be parseable as Common Logic names in that dialect. Dialects intended for use on the Web should allow Universal Resource Identifiers, International Resource Identifiers and URI references to be used as names [2] [4].

p. 13 A dialect may impose particular semantic conditions on some categories of names, and apply syntactic constraints to limit where such names occur in expressions. For example, the CLIF syntax treats numerals as having a fixed denotation, and prohibits their use as identifiers.

p. 16 ... a name used to identify a phrase in Common Logic is understood to be a globally rigid identifier of that text as written (see next section), so that the same name shall not be used to refer to a different text, even if the texts have the same meaning.

Names used to name texts on a network are understood to be rigid and to be global in scope, so that the name can be used to identify the thing named – in this case, the Common Logic text – across the entire communication network. (See [8] for more full discussion.) A name which is globally attached to its denotation in this way is an identifier, and is typically associated with a system of conventions and protocols which govern the use of such names to identify, locate and transmit pieces of information across the network on which the dialect is used. While the details of such conventions are beyond the scope of this International Standard, we can summarize their effect by saying that the act of publishing a named Common Logic text is intended to establish the name as a rigid identifier of the text, and Common Logic acknowledges this by requiring that all interpretations shall conform to such conventions when they apply to the network situation in which the publication takes place.

and Section 6.3 in general.

p. 29 (Annex A) Any name assigned to a named text or a module, and any name occurring inside an importation, shall be a network identifier. For Web applications at the time of writing, it should be an IRI

greenTara commented 11 years ago

From Pat Hayes: Well, that is not a total disaster. It means that quantifiers which bind an identifier have an unusual meaning, in effect the quantification is 'cancelled' in such cases. As writing such a sentence is unlikely to ever happen except maybe as a syntactic mistake, maybe we can just draw attention to this peculiar phenomenon and issue a warning to users, rather than modify the grammar so as make it illegal. That would certainly be a lot simpler.

greenTara commented 11 years ago

Let's look at some consequences of this "cancelled quantification" semantics. Consider the text T1

(forall (b) (b a)) (not (a a))

Whether T1 is satisfiable or not depends on whether "b" is an identifier or not.

Question: When we speak of properties such as CL-satisfiability of a text, should this actually be CL-satisfiability on a particular network (with a particular lexical space of identifiers)?

If the answer to this question is Yes, then the value of an interpretation, the satisfiability of a text, and the contradiction or entailment of one text by another are all dependent on the network of concern, because different networks would in general have different identifier systems. Yet the CL standard does not talk about a network as a qualifier for these properties. In particular, the conditions of exactly semantically conformant would need language about networks, or at least a requirement that the dialect and CL interpretations treat the same names from the vocabulary as identifiers.

If the answer to this question is "No", then what could the alternatives be?

  1. Default: assume the name is not an identifier, unless it is used to identify a text or module in the text.

In this case, the "cancelled quantification" semantics could lead to non-monotonicity.

The text T1 is inconsistent because the first sentence entails (a a).

Now add something to this text

(forall (b) (b a)) (not (a a)) (cl-text b (= c d))

and the text becomes consistent because the third sentence "cancels" the quantification. I think we are all in agreement that non-monotonicity does not belong in CL.

  1. Assume the Web is the network, and identifiers are exactly those names that belong to the lexical space of IRIs. In this case the CLIF EBNF grammar is not sufficiently constrained, because the productions for names of texts and modules, and names in importations, are not restricted to the lexical space of IRIs. In this case

(cl-text "b" (= c d))

is syntactically valid, but "b" is not an IRI, and so not an identifier.

  1. add a syntactic form to CLIF for identifiers, such as enclosed with angle brackets, which N3 and other WWW presentation syntaxes adopted from IETF, e.g. http://xyz.org
  2. add the specification of the set of identifiers to the definition of a vocabulary. If the dialect does not provide an unambiguous means to recognize identifiers, then a particular text may have more than one vocabulary, differing in which names are identifiers. This would require some rewriting of the semantics and conformance conditions.
  3. abandon the cancelled-quantification semantics and require that the identifiers in a "consistent" vocabulary are not bound, and are the only names used to identify texts, modules and vocabularies. Then there are some CLIF texts that are syntactically legal according to the EBNF grammar, such as (forall (b) (b a)) (cl-text b (= c d)) that would have no consistent vocabulary at all. Either the definition of "syntactically legal" could be changed to require not only validity under the EBNF grammar but also that these extra conditions are met, or change the requirement that every syntactically-legal text according to the EBNF grammar is "in the dialect".

I cannot see a way out of this without changing at least one of the following: