Closed mjakubicek closed 5 months ago
A couple of issues:
<xs:all>
in place of <xs:sequence>
throughout and reverting examples 5, 14, 21 and 24.langCode
may be omitted on headwordTranslation
and exampleTranslation
if it can be inferred from above. These schemas make it mandatory in all cases. This needs to be fixed (I am not sure if the condition in the spec is actually implementable in these schema languages). We should revert examples 8, 9, 14, 20 and 21. We should probably also have an example that is complete (i.e., starts with a <lexicographicResource>
) to test this better.Hello, this is Marek Blahuš from Lexical Computing. I have authored both the schemas (XML and JSON). Thank you for your feedback.
- The spec does not state that XML elements are required to appear in any particular order. I would recommend using
<xs:all>
in place of<xs:sequence>
throughout and reverting examples 5, 14, 21 and 24.
The reasons for which I thought it makes more sense that the XML elements appear in an order:
listingOrder
is implicit from the XML serialization) but those elements can be freely intertwined with elements of other types. I believe that children of an element should be either order-sensitive or order-insensitive, but not something in between.<xs:all>
more than once, i.e. the permissible values of minOccurs
and maxOccurs
for the child elements are 0 a 1. Most children of <lexicographicResource>
are allowed to appear multiple times. I can see this constraint has been relaxed in XML Schema 1.1, on which I had to start relying during the schema design process anyway, so with this in mind, it is indeed possible to relax the prescribed order of elements, even if occuring multiple times.<xs:all>
model group is discouraged by XML Schema best practices: "When should I use <all>
model group? Never. The <all>
model groups' limited applicability and unexpected extension semantics should be
avoided. Use a <sequence>
instead." But perhaps this criticism is limited to XML Schema 1.0 only.sequence
was the intention to write a single schema with "switches" representing individual modules, which could be activated by the user at will during validation. In current absence of such a mechanism, decision was made to create two separate versions of the schema, which happen to suffice, because the modules define almost exclusively only optional extensions to existing elements (Crosslingual Module with translationLanguage
being an exception). Even now, however, the present grouping in sequence
s, sometimes intentionally redundant, still makes orientation easier, e.g. if someone will want to strip the schema down for their own use which accepts only a subset of the available modules.To conclude, I will provide an alternative version of the XML Schema(s) with <xs:sequence>
replaced with <xs:all>
where applicable.
- The specification clearly states that that
langCode
may be omitted onheadwordTranslation
andexampleTranslation
if it can be inferred from above. These schemas make it mandatory in all cases. This needs to be fixed (I am not sure if the condition in the spec is actually implementable in these schema languages). We should revert examples 8, 9, 14, 20 and 21. We should probably also have an example that is complete (i.e., starts with a<lexicographicResource>
) to test this better.
I have double-checked and attribute langCode
is required on lexicographicResource
and optional on headwordTranslation
, headwordExplanation
and exampleTranslation
in both the XML and JSON Schemas, exactly as mandated by the specification. For the latter three elements in XML Schema, an <xs:assert>
makes sure that if the attribute is omitted, then there must be exactly one translationLanguage
child of lexicographicResource
. Similar check is implemented in the JSON Schema, this time within the definition of lexicographicResource
, where a rather deep validating hierarchy implements the constraint that langCode
be required if and only if there is not an exactly one translationLanguage
below the lexicographicResource
. In my opionion, therefore, the schemas correctly follow the specification in this regard.
The changes in examples 8, 9, 20 and 21 have been suggested in order to make them valid documents in accordance with the specification (which allows for <entry>
to be the top-level element, but then there is no "above" from which langCode
could be inferred and therefore it must be explicitely stated wherever applicable). The main motivation behind modifying these examples was that they pass validation against the schema. If the examples in Appendix A were not necessarily meant to be full documents (there are, in my opinion, very few hints that would suggest it could be the case), then these changes might indeed be reverted; but in such case those examples will not validate against the schema anymore. Even if the design of the serializations allows for one or multiple entry
s to form a valid document on their own, it should be noted that a pre-existing dependence on a lexicographicResource
in form of its translationLanguage
might be an obstacle against the otherwise nice idea of simply dumping a subset of a lexicographicResource
's entry
s on their own.
Note that the change in example 14 is not related to the required/optional langCode
issue, but rather to the fixed/arbitrary element order issue as discussed above (to which some more of the proposed example changes, not explicitely mentioned in the list quoted above, are related).
langCode
to ensure that we have good validation. The spec also seems to contradict itself on whether entry
is a valid top element or not.These probably need to be implemented as new issues
@DavidFatDavidF , can you please explain in more detail why the <xs:import>
should be necessary? We tried to track it down but found no evidence that the normal way (xmlns:xs="http://www.w3.org/2001/XMLSchema
) is deprecated or otherwise not recommended.
Make it reviewable. There will be more changes due to #72
Comments on variants of schemas:
Each schema (XML and JSON) have two variants: one for documents implementing the Crosslingual Module (and possibly some other modules) and one for documents not implementing it (but possibly implementing some other modules).
Comments on fixes made in the attached example XML files (compared to what is in the repository at https://github.com/oasis-tcs/lexidma/tree/master/dmlex-v1.0/specification/examples/examples/source):
three sources of issues:
Following issues have been fixed in the provided files:
0.xml.xml contains lexicographicResource/title as element instead of attribute as in the XML Serialization
5.xml.xml reverses the order of partOfSpeechTag and inflectedFormTag, in contrast to the XML Serialization
7.xml.xml contains lexicographicResource/title as element instead of attribute as in the XML Serialization
8.xml.xml all instances of headwordTranslation are missing the (in this case required) attribute "langCode"
9.xml.xml all instances of headwordTranslation and headwordExplanation are missing the (in this case required) attribute "langCode"
10.xml.xml contains lexicographicResource/title as element instead of attribute as in the XML Serialization
14.xml.xml translationLanguage and entry are in inverse order when compared to the XML Serialization
20.xml.xml headwordTranslation is missing the (in this case required) attribute "langCode"
21.xml.xml headwordTranslation and exampleTranslation are missing the (in this case required) attribute "langCode"
21.xml.xml order of headwordTranslation and example differs from order in the XML Serialization
24.xml.xml order of etymonLanguage and etymonType differs from order in the XML Serialization
New comments on the PDF document (dmlex-v1.0-csd02.pdf), ordered by chapter numbers:
Some properties in the specification use the data type "string", e.g. 4.3.5 memberType has property "role" with data type "string" (unlike the following properties which hava data type "non-empty string")
4.3.5 there is no constraint establishing that "max" must be greater or equal to "min" – should it be added?
5.2.2.1 missing "transcriptionSchemeTags" in the list of "Members if implementing the Controlled Values Module"
in multiple places, where the specification/serialization says "number", apparently an integer is meant (such as "min" and "max" in memberType, or "startIndex" and "endIndex" in the various markers, and probably also obverseListingOrder)
5.1.2.25 (already reported:) relationType.@scopeRestriction should be OPTIONAL (like in JSONSchema and the model), not REQUIRED
Old (already reported) comments on the PDF document (dmlex-v1.0-csd02.pdf), ordered by chapter numbers:
2. posses [typo]
2. fargemnt [typo]
3.1 uri REQUIRED (zero or one). [required element cannot occur zero times]
4.2.9 missing labelTypeTag among "property of" sameAs
4.3.5 In memberType, property "type" is defined as UNIQUE, which contradicts existing examples (such as 12.xml) and does not seem to make sense (e.g. why would it not be possible to define a relation between two entries or two senses?). In the proposed schema, the corresponding uniqueness constraint is present, but commented out (inactive) and marked as "possibly erroneous".
4.3.6 & A.1.17 & A.1.19 remainders of probably deprecated property memberRole (not defined anywhere)
5.1.2.1 transcriptionSchemeTag is not listed as possible child of lexicographicResource
5.1.2.15 XML element:
is missing among child elements
5.4. relationhips [typo]
5.4. reational [typo]
5.4.3.1 includng all [typo]