rmraya / OpenXLIFF

An open source set of Java filters for creating, merging and validating XLIFF 1.2, 2.0 and 2.1 files.
https://www.maxprograms.com/products/openxliff.html
Eclipse Public License 1.0
65 stars 17 forks source link

XLIFF 2.0 validator invalidate user defined subState/subType values #13

Closed yumaoka closed 2 years ago

yumaoka commented 2 years ago

According to XLIFF 2.0 specification, XLIFF user can define subState/subType values with custom namespaces.

For example, example XLIFF below defines a custom namespace abc and abc:mt is used for subState attribute in <segment> element.

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0"
 srcLang="en" trgLang="ja" xmlns:abc="http://example.com/xliff/abc">
<file id="f1">
<unit id="u1">
  <segment id="s1" state="translated" subState="abc:mt">
    <source>Hello</source>
    <target>こんにちは</target>
  </segment>
</unit>
</file>
</xliff>

My understanding is that this is supported by XLIFF 2.0 specification. However, the validator returns an error - Invalid prefix 'abc' in "subState" attribute

com.maxprograms.validation.Xliff20 has a list of known prefixes (namespaces) as below.

    private List<String> knownPrefixes = Arrays.asList("xlf", "mtc", "gls", "fs", "mda", "res", "ctr", "slr", "val",
            "its", "my");

The list is fixed, so any other prefixes not included in this list will be invalidated. BTW, "my" in this list is not defined by XLIFF specification, but it is used in some examples in the specification. The validator probably should append namespaces declared in <xliff> element to knownPrefixes for validating subState/subType values.

rmraya commented 2 years ago

Hi,

Valid prefixes must be registered with the XLIFF TC. See https://wiki.oasis-open.org/xliff/FragIDPrefixRegistration

The list of accepted prefixes is currently available at https://tools.oasis-open.org/version-control/browse/wsvn/xliff/trunk/xliff2-fragid/registry.txt

OpenXLIFF validation follows the rules for fragment IDs. Your suggestion makes sense, perhaps it could be another change for future versions of XLIFF.

Regards, Rodolfo

yumaoka commented 2 years ago

Hi Rodolfo, this is for validating custom value used for subType and subState. The FAQ found in the page. My interpretation of the statement below is that you don't need registration of prefix for this case.


Are those prefixes the same as the ones used in custom values like in subState or subType?

Technically no. However, it is strongly encouraged to A) be consistent and use the same prefix for both mechanisms in your custom data and B) to register the prefixes even if they are only for attribute values. In such case it is recommended for the registered URI to be a URL pointing to a public page where the attribute values are described.

yumaoka commented 2 years ago

If registration of prefix is required for subState, it implies no one can use it at this moment, because there are no predefined values available for subState attribute, and no custom prefix is registered for subState values at this moment.

My interpretation of the spec is that users can define their own value of subState for their purpose. But the value must be composed by prefix + ':' + sub-value. so for example <segment id="s1" state="initial" subState="foo:bar"> is not violating the spec.

However, for data interchange purpose, it is recommended that such users who want to use subState attribute register their own prefix and definition of values.

Because a value of these attribute is out of XML namespace's scope, declaring XML namespace in <xliff> element for a prefix used in subType and subState does not really make sense (although it's still not violating XLIFF specification if my understanding is correct).

I think XLFF validator should check if subState/subType value contains : (because it mandates prefix + ':' + sub-value construction), but should not invalidate even prefix is not a known namespace for XLIFF.

rmraya commented 2 years ago

Hello Yoshito,

In the first topic of the FAQ you can read this:

What if I don't register a prefix with my extension? If your extension doesn't use IDs it's OK.

If your extension uses IDs, no-one will be able to point to an element of your extension if they don't know the prefix to use, and more importantly: XLIFF 2 validators will flag as invalid any documents using fragment identifiers with un-registered prefixes.

This means that if you don't register a prefix, the fragment identification feature is unusable. This is really silly.

The validator is doing what the XLIFF TC says. What needs to be fixed is the requirement imposed by the XLIFF TC. Once that has changed, the validator will be updated accordingly.

rmraya commented 2 years ago

Opened an issue on XLIFF 2.2 repository to implement the change in future versions of XLIFF.

https://github.com/oasis-tcs/xliff-xliff-22/issues/12

DavidFatDavidF commented 2 years ago

The specification doesn't require subState values to be registered See http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#substate It just gives a permission to register subState prefixes

Invalidating subState attributes with unregistered prefixes is not conformant with the current specification. The specification doesn't need to change in order to fix the validation behavior.

The confusion might come from the fact that the registration mechanism for the subState (or subType) prefixes is the same as for fragid. Registration of fragid prefixes for element based extensions is required, otherwise pointing to data in private elements in impossible..