nlbdev / nordic-epub3-dtbook-migrator

Tools for converting between a strict subset of DTBook and EPUB3.
http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
GNU Lesser General Public License v2.1
8 stars 7 forks source link

Give warnings for content files with language codes not matching dc:language in package.opf #503

Open martinpub opened 2 years ago

martinpub commented 2 years ago

Even though whole content files could use another language code than the main language specified in package.opf, it would be convenient if the validator would flag if there is a mismatch between a content file and dc:language.

In more technical terms, if the content file's xml:lang and lang attributes on the root element do not match the value in dc:language.

This would avoid mistakes where e.g. all content files are in one language and it is not the language defined in dc:language.

josteinaj commented 2 years ago

How do we handle multi-language books? In our library system (and presumably most library systems), you can mark the book as having multiple languages with the ISO 639-2 code "mul", and then list all languages in a separate field. In EPUB 3 terms, you can (as far as I know) have multiple dc:languages, but you can only have one xml:lang/lang attribute. So should we use mul for xml:lang/lang in these cases? And if so, validate that there must be at least two dc:languages?

I suppose whether or not to allow the mul code in xml:lang/lang could be an issue of its own…

martinpub commented 2 years ago

Good point @josteinaj. I thought that cataloguing rules prescribed using one primary language only, I need to verify this with our cataloguing expert.

Update: Checked with our cataloguing expert, and yes, the current limitation was not in cataloguing but in our internal production system, which does not allow multiple values for language. The cataloguing rules we use in Libris (the National Library's catalogue), RDA, prescribe 1-6 languages recorded specifically. If the number of languages >6, then using "mul" is suggested, but I think the latter is national praxis only, not RDA.

Need to get back on this one with more thoughts. Tbc.