Processing language for multilingual resources

gsergiu commented 7 years ago

The conclusion of ticket https://github.com/w3c/web-annotation/issues/337#issuecomment-238557004 is that there is a M to N relationship between the dc:language of multilingual resources and the text processors that might process the annotation body and/or target.

Therefore the following proposal for the definition of the processing language property:

"This property represents the relationship between the language of the resources (Body or Target) and the text processors or classes of text processors that may process the resources for rendering, indexing or any NLP processing."

Consequently I propose that the verbose representation of this property should include <language, processor_class, processor_id> tuples. It is recommended to use a vocabulary for processor classes like: textual_representation, audio_representation, visual_representation (i.e. image), text_indexing, nlp_processing Example:

processingLanguage:{
  {language: [“en”, “fr”, “ro”],  processor_class: “textual_representation”},
  {language: “en”,  processor_class: “text_indexing”, processor_id : “<snowball_indexer_uri>”},
  {language: “ro”,  processor_class: “audio_representation”, processor_id : “<TTS_RO_uri>”}
}

The minified representation could be compliant with the current specification, with the meaning that all text processors (all types) should use the same processing language.
There are 2 open questions:

a. Should this property be named “processing”? b. Should this information be embedded within the annotations (model) or in the protocol (own http request)?

azaroth42 commented 7 years ago

Then we would have to define all of the processing classes and identities for well known processors. That's far far outside of the scope of this working group.

Sorry, but we just can't do that. And especially now during CR. Tagging as V2.

gsergiu commented 7 years ago

This implies to define processing classes, yes, however it was not requested that the standard. in #309 was proposed the usage of a primer document with practical guidelines for implementation.

I was simply proposing the correct structure which is perfectly alingned with the motivation and explanations provided by @r12a and @fsasaki which ended up whith the introduction of this processingLanguage. As it is obvious and recognized in #335 the current definition of the processingLanguage and the information carried by this field is incomplete.

I really don't understand ... why again the decision to close this ticket without discussing it with the stakeholders.

akuckartz commented 7 years ago

@gsergiu This issue was not closed.

gsergiu commented 7 years ago

@akuckartz It is ok for me to postpone for V2, however I would suggest to take this option in account for the related tickets, even if this will not be solved as proposed in V1. I probably missinterpreted the comment of @azaroth42

w3c / web-annotation

Processing language for multilingual resources #341