readium / webpub-manifest

📜 A JSON based Web Publication Manifest format used at the core of the Readium project
BSD 3-Clause "New" or "Revised" License
91 stars 23 forks source link

EPUB parsing problem with language maps for title strings #34

Closed danielweck closed 4 years ago

danielweck commented 4 years ago

Concrete example: https://github.com/IDPF/epub3-samples/blob/0b3e80553f509fbdfe591008cd7d1e804b24db54/30/regime-anticancer-arabic/EPUB/package.opf#L8-L11

Because ar (Arabic) is already used by the meta refines, the default dc:title expressed in French cannot also take the ar key in the language map (JSON object), as per the dc:language.

The current buggy r2-shared-js implementation generates (note the missing French dc:title):

        "title": {
            "ar": "السرطان من  للوقاية الصحيح الغذائي  النظام"
        },

I have fixed this bug by generating a ficticious language code placeholder / object key _:

        "title": {
            "_": "Le Vrai Régime anti-cancer",
            "ar": "السرطان من  للوقاية الصحيح الغذائي  النظام"
        },

Any other suggestions?

@JayPanoz is this included in the parser doc? (sorry, I cannot find the link anymore)

JSON Schema: https://github.com/readium/webpub-manifest/blob/ff5c1e9e76ccc184d4d670179cfb70ced691fcec/schema/metadata.schema.json#L15-L50

JayPanoz commented 4 years ago

Link to the section in the doc

danielweck commented 4 years ago

Thank you Jiminy! So, xml:lang attribute directly on dc:title, or in ancestor element (metadata or package) should be used for the default key in the language map. I'm going to fix the r2-shared-js implementation.

danielweck commented 4 years ago

I'm still using _ as a fallback, just in case there are no xml:lang to rely on, but this should never really happen, and it will be flagged by the JSON Schema validator.