samvera / hyrax

Hyrax is a Ruby on Rails Engine built by the Samvera community. Hyrax provides a foundation for creating many different digital repository applications.
http://hyrax.samvera.org/
Apache License 2.0
184 stars 124 forks source link

Rdf langstring support for multivalued text fields #2534

Open ghost opened 6 years ago

ghost commented 6 years ago

Descriptive summary

In current Hyrax implementation, when a text field is multivalued, it is not possible to differentiate values by language. However, semantic web standards allow, "if and only if the datatype IRI is http://www.w3.org/1999/02/22-rdf-syntax-ns#langString, a non-empty language tag as defined by [BCP47]." (https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal)

Rationale

In non-English speaking countries, it is often mandatory to describe publications and data both in English and in the local language. Publications and data of humanities and social sciences are generally written in the local language.

Expected behavior

When a text field is multivalued, it should be accompanied by a language selector widget. The default proposal would be user-configurable. If the field is marked as :stored_searchable, the filters applied in Solr should be language-neutral or language-specific (e.g. stemming algorithms, lists of stopwords). Language-specific filters would imply that richer suffixes would be needed for Solr field types, not just *_tesi. But this raises the question of which filters to apply to search queries. At display level, if the interface is provided with a language switcher, all fields should be displayed in the user's preferred language, if available and otherwise, in the default language of the platform, or in any available language. In Fedora, strings should be stored as language tagged.

Actual behavior

For now, multivalued text fields do not provide the ability to enter language information. All stored_searchable fields are typed in Solr as "English text (te), stored (s), indexed (i) ". When an instance of Hyrax has a language switcher, it seems to have no effect on the display language of the metadata. Strings in fedora provide no information about their language.

briesenberg07 commented 6 hours ago

+1 for facilitating lang-tagging text values in Hyrax