w3c / epub-specs

Shared workspace for EPUB 3 specifications.
Other
303 stars 60 forks source link

lang and dir of dc:date #2650

Open mattgarrish opened 1 week ago

mattgarrish commented 1 week ago

This is similar to #2635 in that dc:date is recommended to use an ISO8601 date but it's not a requirement. @clapierre has had a case of a publisher putting in language-specific date. It will inherit the language and direction of the package document, which is probably good enough if that metadata is set, but for consistency it seems like we should allow the lang and dir attributes on the element.

At this point, I'm wondering why we even have these restrictions on the attributes when we don't enforce the values of any of the dc optional elements. dc:format and dc:type could have a human-readable values and those are the only ones left excluded.

iherman commented 1 week ago

This is similar to https://github.com/w3c/epub-specs/issues/2635 in that dc:date is recommended to use an ISO8601 date but it's not a requirement. @clapierre has had a case of a publisher putting in language-specific date. It will inherit the language and direction of the package document, which is probably good enough if that metadata is set, but for consistency it seems like we should allow the lang and dir attributes on the element.

+1

At this point, I'm wondering why we even have these restrictions on the attributes when we don't enforce the values of any of the dc optional elements. dc:format and dc:type could have a human-readable values and those are the only ones left excluded.

Well... is that correct (independently of the fact whether it is checked or not)? How would a free, human-readable text would mean anything tangible for format or type?

I presume the only reason of having a restriction is "psychological"; we hope that fewer people will provide a stupid value. But it is not tested, which probably mean that we could change from an implicit MUST NOT to a SHOULD NOT (as in "SHOULD NOT use language tag" or something like that).

mattgarrish commented 6 days ago

we hope that fewer people will provide a stupid value

Maybe, but when we don't define what a smart value is, then we're sort of saying we really don't care that much one way or the other.

For example, format is "The file format, physical medium, or dimensions of the resource." So, if I write a description in English that includes "### pages" as the dimensions that's become a language-specific description.

We used to restrict dc:type to identifiers, but now that the restriction has been lifted I don't know what someone might try to write in it. Its definition is "The nature or genre of the resource." That again lends itself to language-specific descriptions.

which probably mean that we could change from an implicit MUST NOT to a SHOULD NOT

Or maybe stay out of restrictions and put a note in advising people they should use the properties as dublin core defines them and use controlled vocabularies wherever suggested rather than write in language-specific descriptions. But still allow the attributes for any oddball cases.

iherman commented 6 days ago

Or maybe stay out of restrictions and put a note in advising people they should use the properties as dublin core defines them and use controlled vocabularies wherever suggested rather than write in language-specific descriptions. But still allow the attributes for any oddball cases.

Yes, that can be wiser indeed; leave it to DCMI. Just as a reminder, the DCMI Metadata Terms say:

Term Name: format Defintion: The file format, physical medium, or dimensions of the resource. Comment: Recommended practice is to use a controlled vocabulary where available. For example, for file formats one could use the list of Internet Media Types [MIME]. Examples of dimensions include size and duration.

Term Name: type Definition: the nature or genre of the resource. Comment: Recommended practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMI-TYPE]. To describe the file format, physical medium, or dimensions of the resource, use the property Format.

Everything is only "recommended practice". We could just refer to this and stop there.

jenstroeger commented 1 day ago

@clapierre and I have been talking about localizing various metadata and markup recently, and the date came up as well (particulalry for RTL books in Arabic, etc.)

The next question I have is then — what about page numbers? I suspect that if the root <html> element defines e.g. Arabic as the language then page numbers should also use Arabic numbers?

mattgarrish commented 1 day ago

I suspect that if the root <html> element defines e.g. Arabic as the language then page numbers should also use Arabic numbers?

Do you mean using Eastern Arabic numerals rather than Western? If so, then, yes. I would expect any book would represent its numerals in its text's language, not switch them to another script. That includes page break markers.

If there's a specific problem you think needs addressing, though, I'd recommend opening a new issue so we don't veer off topic on what needs fixing with the optional DC elements.

jenstroeger commented 1 day ago

I would expect any book would represent its numerals in its text's language, not switch them to another script. That includes page break markers.

@mattgarrish thank you, that answers my question 👍🏼