Multi-lingual labels and descriptions for concepts

coolharsh55 commented 1 year ago

This issue is a placeholder for discussions regarding multi-lingual translations for labels and descriptions associated with DPV concepts. This includes the concept label, description or definition, and any comments. Note the IRIs will not be translated as they are the identifier of the concept.

coolharsh55 commented 1 year ago

Discussed in Meeting 2023-06-22 where the next step is to work with the language speakers to create a glossary of terms to ensure the machine translations are correct(ed) and to assess the outcome of this process.

bact commented 7 months ago

Discussed in Meeting 2023-11-22 about:

1) How to present multiple languages in the HTML documents?

How the user is going to select a language?
Displaying a language together with English at the start for the term labels and descriptions on that page in case the translation is incorrect

2) Translation frequency? e.g. once a year.

3) How to do generate these systematically?

bact commented 7 months ago

Resources:

Multilingual GDPR Lexicon https://api.digitalgrammars.com/gdpr (English, German, French, Italian, Spanish)
ISO/IEC 22989:2022 AI concepts and terminology (also has terms on 'data') https://www.iso.org/obp/ui/#iso:std:iso-iec:22989:ed-1:v1:en (English, French)

pmcb55 commented 7 months ago

Sorry to just 'jump in' here, but in relation to "How to present multiple languages in the HTML documents?", what are you currently using to generate those HTML documents? (If Widoco, then the answer is that Widoco handles this for you (i.e., it provides a language drop-down in the top-right-hand corner of the HTML page), so I presume you're not using Widoco.

But if not Widoco, then can I ask why not? i.e., what specific feature(s) do you think are currently missing, as I believe Daniel (the creator and maintainer of Widoco) is very open to adding missing features, especially if funding might be available(!)...?

bact commented 7 months ago

They are generated from data in spreadsheets by https://github.com/w3c/dpv/tree/master/documentation-generator.

coolharsh55 commented 7 months ago

Hi Pat. tldr; the script grew from simple HTML to a complex set of documents which I do not know how to manage using Widoco. I'm open to someone else figuring out how to use Widoco for DPV.

what are you currently using to generate those HTML documents? (If Widoco, then the answer is that Widoco handles this for you (i.e., it provides a language drop-down in the top-right-hand corner of the HTML page), so I presume you're not using Widoco.

Yes, we are not using Widoco. We have a bunch of python scripts hacked together to produce the RDF and the HTML.

But if not Widoco, then can I ask why not? i.e., what specific feature(s) do you think are currently missing, as I believe Daniel (the creator and maintainer of Widoco) is very open to adding missing features, especially if funding might be available(!)...?

The main reason is flexibility to dictate what the HTML content for each term looks like.

1) DPV has somewhere near ~1000 concepts in the main vocabulary, spread across several 'modules'. Widoco puts them all within the same big list of concepts and AFAIK doesn't allow separating concepts by modules or sections - this has to be done manually. This means all purposes, technical measures, etc. get put into a single list and then we have to manually generate the HTML for set of concepts e.g. some script to list purposes, another one to list technical measure, and so on. I opened an issue to discuss this - see dgarijo/Widoco#558 (sadly no responses) 2) In order to modify Widoco outputs, we need XSLT templating knowledge - which I personally do not have. I did look into it, but found a steep learning curve. I know python & jinja2 - so I have used that and set up the code in a way that you can swap out the RDF generation and HTML generation parts with someone else in the future - see dgarijo/Widoco#175 where XSLT templating is mentioned 3) We use ReSpec as this work is a W3C CG output. Widoco AFAIK does not have ReSpec as an output - see dgarijo/Widoco#175 for mention of ReSpec 4) We have multiple 'serialisations' and Widoco only supports OWL (sort of). So where we are using a SKOS+RDFS based taxonomy - which is quite uncommon - Widoco won't produce the output we want. E.g. Purpose taxonomy has dpv:Purpose as a class and all purposes are instances of this class with skos:broader/narrower relations between them. In the OWL variant, they use rdfs:subClassOf instead. 5) also see https://github.com/w3c/dpv/issues/53#issuecomment-1265567887 where we discussed in passing the implications of tooling on ability to produce specific kinds of documentation

w3c / dpv

Multi-lingual labels and descriptions for concepts #89