sdmx-twg / sdmx-im

SDMX Information Model - UML model and functional description, definition of classes, associations and attributes
6 stars 3 forks source link

Allow separating translations from artefacts #34

Open hoehrmann opened 2 months ago

hoehrmann commented 2 months ago

This is a proposal to add a feature that would allow maintaining translations independent of nameable-and-identifiable artefacts.

While supranational statistics organisations might have official translations for their artefacts that are maintained as part of an artefact definition, even they might want to sidestep questions about how adding or updating a translation should affect the version number of an artefact (and possible downstream consequences). They might also want to have different access control restrictions for translation maintenance and artefact maintenance. Both use cases are easier to implement if the some or all translations are maintained apart from the artefact definition.

Smaller or less international organisations might have to rely on external resources for translations, like a user community or machine translation services, but might want third-party contributions kept separate from official text (for instance, to make clear that translations are provided on a best-effort basis, which they cannot with the current scheme because there is no option to annotate translations, short of using inline disclaimers, or something like comments in SDMX-ML messages, a feature not available in SDMX-JSON).

Consider this use case: Wikipedia contains many lists, tables, and charts that are based on official statistics. They are often out of date and unmaintained. If the Wikipedia community were to automatically update them using data from SDMX web services, they would face the issue that translations for many languages are not provided by the web services, and they might want to use their existing translation infrastructure to add them, perhaps with custom logic that merges SDMX messages with official data with their community translations.

Furthermore, maintaining all translations as part of the artefact definition precludes grouping translations by language, which you might want to do for reviews by language experts or a translation service.

As it is, these groups of SDMX users cannot use existing SDMX features (short of duplicating all artefacts into different namespaces) to address this.

Proposed Syntax:

In a SDMX-JSON structure message, it could like this:

data:
    translations:
        - target: urn:...(2.0+.0).Example
          lang: en-US
          name: ...
          description: ...
          annotations: ...

Benefits:

(In doubt, please handle this as a public review comment on SDMX 3.1 once the comment period begins.)