nlbdev / nordic-epub3-dtbook-migrator

Tools for converting between a strict subset of DTBook and EPUB3.
http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
GNU Lesser General Public License v2.1
8 stars 7 forks source link

Fix endnotes issue #561

Closed kalaspuffar closed 1 year ago

kalaspuffar commented 1 year ago

Hi @josteinaj and @martinpub

This PR tries to solve the issue of the endnote role not being allowed. This is more described in https://github.com/nlbdev/nordic-epub3-dtbook-migrator/issues/556

I made two changes to the rules.

As you see, there are many changes, but the main thing here is that I created a test to verify the validity of our test case and found many issues.

I'm not sure if verifying all the test changes is interesting but you might have an input on the minor rule changes I made.

Best regards Daniel

josteinaj commented 1 year ago

Thanks @kalaspuffar!

I'm only wondering about the dc:title element in the OPF.

From the specification:

Reading Systems MUST recognize the first title element in document order as the main title of the EPUB Publication (i.e., the primary one to present to users). This specification does not define how to process additional title elements.

https://www.w3.org/publishing/epub3/epub-packages.html#sec-opf-dctitle

They use this example:

    <dc:title>THE LORD OF THE RINGS</dc:title>
    <dc:title>Part One: The Fellowship of the Ring</dc:title>

I'm wondering if we should keep them separated also in HTML? For instance like this:

    <title>THE LORD OF THE RINGS</title>
    <meta name="dc:title">Part One: The Fellowship of the Ring</meta>

So: the first OPF <dc:title> maps to HTML <title>, and all the following OFP <dc:title> maps to HTML <meta name="dc:title">.

What do you think @martinpub @AndersEkl @kalaspuffar?

In the future we could also consider putting the name of the chapter (or whatever is the first structural item in the document) as <title> (and then just mapping all OPF <dc:title> to HTML <meta name="dc:title">. Because there are some accessibility concerns:

A common navigation technique for users of assistive technology is to read the page title and infer the content the page contains. This is because navigating into a page to determine its content can be a time-consuming and potentially confusing process. Titles should be unique to every page of a website, ideally surfacing the primary purpose of the page first, followed by the name of the website. Following this pattern will help ensure that the primary purpose of the page is announced by a screen reader first. This provides a far better experience than having to listen to the name of a website before the unique page title, for every page a user navigates to in the same website.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title#accessibility_concerns

kalaspuffar commented 1 year ago

Hi @josteinaj

Yes, I think this could be an improvement for the future. Looking at the current specification:

2.5.2 Title
The <title> element of every xhtml content document must match the dc:title metadata of the package file.

The information is sparse, so the implementation in this PR should at least follow that directive. But creating an issue in this repository or future development for the format repository could be good. But without a change to the specification, I think we should keep the implementation so it follows the specification currently written.

It could be interpreted a couple of ways, though. Either we do as this PR, or we just join them with a single space, or we just pick the first dc:title element to match with the other documents.

Best regards Daniel

josteinaj commented 1 year ago

In the EPUB specification with the multi-dc:title example, they join with a comma. In other cases it might make more sense with a colon or a hyphen. What we choose should be specified to the producers, but it is not described in the nordic specification. So I think it would make the most sense to just use the first dc:title, instead of joining them. But I think @martinpub needs to decide.

karladamt commented 1 year ago

Martin has recently changed position here at MTM, so I don't know if he will do this anymore. Until it is settled here who will continue this work, MTM through me decides that we follows Josteins suggestion and only use the first dc:title.

kalaspuffar commented 1 year ago

Hi @josteinaj

I've now changed the code to only validate the first (main) title against the content documents.

Best regards Daniel

martinpub commented 1 year ago

Martin has recently changed position here at MTM, so I don't know if he will do this anymore. Until it is settled here who will continue this work, MTM through me decides that we follows Josteins suggestion and only use the first dc:title.

Hi, just wanted to say I'm still monitoring this repo, just haven't had time yet to reply. I hope to be involved in further guidelines work, but perhaps not as much when it comes to implementing validation rules. However, feel free to ask/invoke me in discussion if you feel like it. I will be happy to share my thoughts.

Regarding the specific issue at hand, I think the current one suggested by @josteinaj and implemented by @kalaspuffar is a reasonable one. We should note @josteinaj's remarks for guidelines revision work.