rhdunn / cainteoir-engine

The Cainteoir Text-to-Speech core engine
http://reecedunn.co.uk/cainteoir/
GNU General Public License v3.0
43 stars 8 forks source link

replace dc:language names with links to languages.rdf #19

Closed rhdunn closed 11 years ago

rhdunn commented 12 years ago

The idea behind this is to simplify the logic to associate the language of a document with the language of a text-to-speech voice.

This requires that the RDF functionality supports (graph, subject, predicate, object) quads and the triplestore model with languages.rdf preloaded.

When constructing the voice metadata, the tts-engine { voice dc:language L } should be replaced with tts-engine { voice dc:language U } where languages.rdf { U skos:altLabel L }.

When adding a document to the triplestore, the document language doc { doc dc:language L } should be replaced with doc { doc dc:language U } where languages.rdf { U skos:altLabel L }.

Resolving the language name is the following lookup: ( X dc:language Y join language.rdf { Y skos:prefLabel L } ) as L.

rhdunn commented 12 years ago

The idea here is to make matching a document language to voice language more robust by going on the uris in languages.rdf, since the document and voice may have different but equivalent dc:language labels (e.g. gl and glg for Galician).

rhdunn commented 12 years ago

This is changing the way the cainteoir::languages class works -- in fact, this class will no longer be needed and should therefore be removed. The functionality will be done through metadata queries and metadata transformation.

This is also similar to the way that the (dc:publisher, dc:subject) replacement will work. Therefore, the core cainteoir-engine API should support both replacement types and others as well.

rhdunn commented 11 years ago

Defering this as I am no longer sure this will be beneficial over the current language implenementation (i.e. implementing the language code spec and lookup mechanism, using languages.rdf as the data source).