nlbdev / nordic-epub3-dtbook-migrator

Tools for converting between a strict subset of DTBook and EPUB3.
http://nlbdev.github.io/nordic-epub3-dtbook-migrator/
GNU Lesser General Public License v2.1
8 stars 7 forks source link

Allow `<hr class="separator">` in between different block elements? #455

Closed martinpub closed 3 years ago

martinpub commented 3 years ago

Currently, the RelaxNG rules only allow <hr class="separator"> in between <p>s. The guidelines do not provide that level of contextual detail, but there might be cases were a separator should be allowed even if the separated block elements are different, e.g. <p> and <blockquote>.

Can we open up for a less restrictive separator validation rule?

(Ping @AndersEkl, @kalaspuffar)

kalaspuffar commented 3 years ago

Hi @martinpub

It seems that an HR tag is specified only in the definition of the P tag. So you may have one before a paragraph but not somewhere else. How do we want this to work? Should it be a block element as all the other ones can create interesting structures, or should we allow an optional HR before blockquote, for instance?

<define name="p">
        <!-- Use: p contains a paragraph, which may contain subsidiary lists -->
        <a:documentation> Use: p contains a paragraph. In HTML, lists are not allowed inside paragraphs. </a:documentation>
        <optional>
            <ref name="hr"/> <!-- hr -->
        </optional>
        <element name="p">

Best regards Daniel

martinpub commented 3 years ago

@kalaspuffar p + hr + blockquote was what triggered in the test book. Not sure what is the best way forward here, would it be to allow it as a regular block element anywhere similar to other blocks? What do you say @AndersEkl? I think that we need to bring up a clarification of the appropriate use of hr in the guidelines group, so this is how strict validation should be while waiting for that :-)

kalaspuffar commented 3 years ago

Hi @martinpub

It's always easier to make rules less strict in the future but if we allow it everywhere it will more complicated to reverse that in the future if we want a stricter handling.

Best regards Daniel

AndersEkl commented 3 years ago

One of the main things we talked about initially was that we wanted a less strict validation. I don't understand why we would want such strict rules about in what context <hr> is allowed. I see the <hr> structure as a possible solution to handle weird structural situations in books and if we apply such strict limitations to its use we create more problems than we solve.

kalaspuffar commented 3 years ago

Hi @AndersEkl

Well I've not tried this but I could think that situation like

<table>
        <tr>
            <th><hr/></th>
        </tr>
        <tr>
           <td>Some content <hr/> some other content</td>
        </tr>
</table>

Could be hard to layout in braille and perhaps even crash some readers. But if you have checked that all readers support HR in all contexts the change to allow it everywhere should be a minor change.

Best regards Daniel

AndersEkl commented 3 years ago

@kalaspuffar There are several other elements that could be used in contexts that are not desirable, but would be allowed by the validator. So I don't know if we should single out just <hr> to be very strict about.

kalaspuffar commented 3 years ago

@AndersEkl True true, we could reach the appropriate validation in at least two ways. Either allow it everywhere and make it more strict as users find issues with reading it. Or allow it in more places where producers have problems creating correct documents.

AndersEkl commented 3 years ago

Ideally, we would want to allow it anywhere in a continuous text context, I think, but not inside other types of constructs, such as tables. But maybe we are overthinking things? Would we start seeing <hr> being used all over our books if we loosen up the strictness? @martinpub what do you think?

kalaspuffar commented 3 years ago

Hi @martinpub and @AndersEkl

It seems that the HTML specification supports the implementation we have today for these kinds of thematic breaks.

https://www.w3.org/TR/html52/grouping-content.html#the-hr-element

So semantically, even adding it before blockquote could be seen as an incorrect usage. Then again, that's up to us to decide :)

Best regards Daniel

AndersEkl commented 3 years ago

@kalaspuffar Still, the link you sent categorizes <hr> as flow content. Thus, it is acceptable to use wherever flow content is expected.

martinpub commented 3 years ago

I think the main question here is the distinction between acceptable and semantic correctness. The HTML spec is quite clear in the semantic use of <hr>, i.e. it denotes thematic breaks in between paragraphs that do not have any other means of signalling such break (a shift to another element, e.g. <section> or <blockquote> would carry the break semantics along with them). HTML spec.: "There is no need for an hr element between the sections themselves, since the section elements and the h1 elements imply thematic changes themselves." However, technically it would still be allowed, as @AndersEkl points out.

There might be a risk of overusing <hr> if we loosen up the validation, but I'm not sure to what extent. Perhaps this should be a topic for discussion on the next validator group meeting?

AndersEkl commented 3 years ago

I understand the semantic function of the <hr> element, but I think the assumption that a new contextual part of the text ALWAYS start with a <p> is wrong. In the test book where this error was reported, it was a <blockquote> that started it. A <blockquote> in itself does not necessarily signal a thematic break the same way <section> does. In educational material, it can be something else that follows a thematic break. I can see quite many cases where <hr> would be the obvious way to represent something graphical in the source material that represents a contextual break, but where the current validator will not allow it.

The link that @kalaspuffar posted doesn't say "in between paragraphs", it says "a paragraph-level thematic break". So I think we read too much into it here. I know there will be examples where at least I would want to use <hr> in its intended semantic context, but would be disallowed by the validator in its current state.

martinpub commented 3 years ago

Thanks @AndersEkl, that's a good point.

I can think of even a fiction context where p + hr + blockquote would be appropriate: Say you have a novel using regular indentation to mark p:s, then using a separator to indicate thematic breaks, e.g. super paragraphs. Currently we don't require those to be captured with nesting <section>s, but are OK with <hr> instead. It could be then that the novel's narration sometimes cites a letter, a diary, or something, that is marked up with <blockquote>. And this content may very well be located after a separator. So it should be allowed.

martinpub commented 3 years ago

We discussed this in today's call. The conclusion was to make an adjustment to allow <hr> with <p> on any side, which is a compromise but will make the issue in X50525A valid. The guidelines group can continue the discussion in a revision.