Closed bertfrees closed 6 years ago
Yes, it's quite likely that we will also use EPUB3 as input format in the long run.
Other agencies have done this by making some or all XSLT and CSS files work with both DTBook and HTML (EPUB). It seems indeed like maintainability and readability benefits from that because DTBook is quite similar to HTML.
(The alternative for making XSLT and CSS files work with both DTBook and HTML is to have separate stylesheets for DTBook and HTML, but because they have a lot in common they could import a common stylesheet.)
See for example:
For us, it means the following files either have to be made to work with HTML or ported to HTML.
There are a number of SBS-specific extensions to DTBook for which we need to find an alternative in EPUB 3. I will create a table here with the mapping and we can discuss the problematic ones.
[table moved to wiki page]
@egli Can I get that pointer to the Nordic EPUB specification?
@egli The progress can be seen on branch sbs-9. I'm waiting with merging it because I still can't run all the tests at once even after increasing the memory.
If at some point we want to let Mischa try it, and I haven't found a real solution to the problem yet, we could merge it but with the new tests disabled.
We do have an EPUB test document that was produced in India
@egli and @mixa72 Please have a look at my table above, I've updated it. Maybe you have some more ideas.
That looks pretty good. It is interesting how many standards there are for the different purposes. Compared to our current DTBook, EPUB3 will apparently involve a lot more namespaces and the terminology will be a varied mix. So whatever you decide for the elements brl:select, brl:running-line, brl:toc-line, brl:time is fine by me since it's not possible to find a uniform naming anyway. BTW: AFAIK @brl:class is indeed only used for SBSForm.
Coherentness is indeed something we need to carefully think about. You need to work with this every day so your opinion is important. At the same using standards is also important, and last but not least, compatibility with the Nordic guidelines. Changing the Nordic guidelines is possible but apparently a slow process.
The Nordic guidelines have apparently chosen to use "class" for some semantics instead of a custom "epub:type" prefixed with "nordic:". I'm not sure what the motives were. However they do use epub:types that are available in either the default or the z3998 vocabulary. Moreover, they do have a "nordic:" prefix but they only use it for some of the metadata, not for epub:types.
Nordic's use of class is not always appropriate in my opinion, but I think we have to live with this. It's also hard to avoid the mix of different attributes and prefixes because this is just how EPUB works, and because of the compatibility requirement with Nordic. What we could do to simplify things a bit is to not use our own "sbs:" prefix and use classes instead. This is semantically not optimal, but at least it creates some coherentness with the Nordic guidelines. In addition, we can try to completely eliminate "brl:" elements and attributes.
I would not take the Nordic guidelines as the be-all-end-all truth. While they are useful and most likely will define the shape of the EPUB we will get from our providers I would also be forward looking and improve things where you think it makes sense.
We could of course have a converter from "Nordic EPUB 3" to "SBS EPUB 3". But this makes interchanging files a bit difficult unless we have the conversion in the two directions.
Doing the markup with Oxygen is very user-friendly. DTBooks can be validated against both our inhouse minimal schema and the classic DTD. The most important feature is that the editor displays a list with all the possible elements at any place in the document (auto completion). If Oxygen also behaves like that with EPUB3 files then I don't see any problems for the users. It will take some time to learn and memorize the new markup, that's obvious, but after a while everybody will get used to it.
I talked to @mixa72 about this yesterday and the consensus seems to be that the actual names of the elements that we will use in the EPUB are not so important to the transcribers, as long as oXygen does the auto completion.
Yes that's what Mischa said last time. But still we should think it through. What about the things where I have put question marks?
By me it's ok if you use the following for EPUB3: brl:class --> @class (no prefix)
brl:select --> brl:select (or solution with span) brl:when-braille --> brl:when-braille (or solution with span) brl:literal[@brl:grade=...] --> brl:literal[@brl:grade=...] (or solution with span) brl:otherwise --> brl:otherwise (or solution with span)
brl:running-line --> brl:running-line brl:toc-line --> brl:toc-line brl:volume[@brl:grade=...] --> br[@class='braille-volume-break-grade-...']
brl:time --> brl:time (if we keep brl:date; if we use sbs:date instead, I'd also prefer sbs:time)
But I'm open to accept anything as long as there is no loss in functionality with respect to the actual system.
The choice for moving from brl:homograph
to span[epub:type='z3998:homograph']
, from brl:v-form
to span[epub:type='z3998:v-form']
, from brl:place
to span[epub:type='z3998:place']
, etc. was a no brainer, because all of these terms are defined in z3998. However for brl:date
, brl:time
and brl:name
there are no obvious replacements.
For brl:name
I have proposed span[epub:type='foaf:name']
because I saw that z3998:personal-name is derived from foaf:name. For brl:date
I proposed span[epub:type='dc:date]
because I saw there is a term in z3998 that is derived from a dc term (namely z3998:fulltitle), from which I concluded it must be allowed to use dc.
Using the foaf and dc vocabularies for adding semantic structure feels a bit weird though because normally I associate these vocabularies with metadata (like in "this is the date of this event" or "this is the name of this person").
For brl:time
I haven't got a solution yet. There doesn't seem to exist anything in the z3998 or dc vocabularies.
The reason I proposed the alternative span[epub:type='sbs:date']
was because I thought maybe this way we could impose a specific format (dd-mm-yyyy or whatever), but I don't even know if that makes sense (if it is possible, or even needed).
Is there a specific reason why you want to keep "date" and "time" uniform with each other, but not with the other elements?
Finally, something I'm still wondering is why we have brl:emph
in addition to em
and strong
. All the attributes that are allowed on brl:emph
also seem to be allowed on em
and strong
. Also I would like to use a class attribute instead of the brl:class
attribute, however it appears that the class attribute is already allowed on brl:emph
, em
and strong
. What is it used for?
1) As for sbs:date
that was a misunderstanding: I first thought you wanted to create an element sbs:date
with a separate namespace sbs:...
just for this element (date). That gave me a bit the impression of an overkill. Then I saw that your alternative is in fact a span (span[epub:type='sbs:date']
). That's ok for me ("time" does not have to be uniform with "date")
2) As for brl:emph
and the class attribute on em
, strong
and brl:emph
: brl:emph
is used to render highlighted text other than em
/strong
(e.g. colored, underlined, capitalized, in a different font, etc.) with the same 4 possibilities as em
/strong
(brl:render
= emph / quote / single quote / ignore).
The class attribute on em
, strong
and brl:emph
once was created to semantically group these elements, e.g. em
"foreignword", "onomatopoeia", "stressed", "propername", etc. in order to render them in a coherent way by means of the brl:render
attribute: e.g. "foreignword" --> quote, "stressed" --> emph, "propername" --> ignore, etc. However, this practice has changed in the meantime: currently all em
's are rendered with brl:render="emph"
(default) regardless of the semantics (with some rare exceptions: educational/non-fiction books - they often have plenty of differently highlighted/colored words and each highlighting/color has a special meaning).
OK, got it now. Thanks!
Because in EPUB3 (HTML5) em
and strong
have been given semantic meanings (see http://html5doctor.com/i-b-em-strong-element), I still propose to drop brl:emph
in favor of em
or strong
, and if you want to make clear that in the paper book they are styled differently (a different font or whatever) you indicate this with a class or several classes.
Because a class attribute can have more than one class, it shouldn't be a problem to combine all the requirements (except brl:continuation
) in a single attribute.
@brl:render
-> braille-render-quote
, etc.brl:emph
-> colored
, underlined
, etc. (or just a single class other-emph
)foreignword
, onomatopoeia
, etc. -> sameSo a emphasis element could look something like this:
<em class="propername capitalized braille-render-ignore">bla bla</em>
Note that foreignword
, onomatopoeia
, etc. is semantical information, so ideally we should try to capture this in an epub:type
, however because I assume you cannot make a predefined vocabulary of all possible groups, I think a class is more appropriate here.
Another possible improvement could be to base the rendering of em
/strong
in braille on the CSS value of text-transform
instead of the brl:render
attribute or braille-render-foo
classes.
This way you can still use the braille-render-foo
classes (or whatever you want to call them), if we define them in the default CSS:
.braille-render-ignore {}
.braille-render-quote {
text-transform: quote;
}
.braille-render-singlequote {
text-transform: singlequote;
}
.braille-render-emph {
text-transform: emph;
}
@text-transform quote {
system: -sbs-indicators;
open: "(";
close: ")";
}
@text-transform singlequote {
system: -sbs-indicators;
open: "'(";
close: "')";
}
@text-transform emph {
...
}
(see issue https://github.com/sbsdev/pipeline-mod-sbs/issues/38 about how the @text-transform
rule works)
but in addition you can also specify a custom mapping in CSS. For example:
.foreignword {
text-transform: quote;
}
Great proposals, thanks a lot! That simplifies the whole em/strong/brl:emph story a lot. The reason why we currently use brl:emph and not em/strong is basically the large print: em/strong would italicize/bold the text, a thing to be avoided. So I suppose, if we drop brl:emph and use em/strong instead, we also need a class for the large print .lp-render-ignore or .lp-render-emph to have control over the output, right? Or perhaps this could also be handled in a @media query in the CSS? Anyway, combining multiple classes in the same attribute gives the user much more flexibility. The only downside I see is possibly the fact that all these predefined classes cannot be shown in a drop-down-list in oXygen. The user has to be aware of all the available classes and their functions. Therefore we will also need exhaustive and userfriendly guidelines.
OK, so if we decide to drop brl:emph
and we want to support EPUB3-to-largeprint (or if we want to drop brl:emph
from DTBook too), we would indeed need additional classes like e.g. lp-render-ignore
. It could indeed also be done with media queries, however that would mean we'd need to make the largeprint converter CSS-aware (which is a possibility, just some extra work).
I'm not sure how oXygen's auto-completion works when you allow multiple classes. I can't try it because I don't have oXygen on this computer. @egli or @mixa72 could you maybe try? Use e.g. this schema:
start = anyElement
anyElement =
element * {
attribute class { ("foo" | "bar" | string)+ }?,
attribute * - class { text }*,
(text | anyElement)* }
Christian converted it to .rng for me, but when I open and validate it in oXygen, there is an error at line 10 'E [Jing] repeat of "string" or "data" element'. Do I have to change something?
<grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="">
<start>
<ref name="anyElement"/>
</start>
<define name="anyElement">
<element>
<anyName/>
<optional>
<attribute name="class">
<oneOrMore>
<choice>
<value>foo</value>
<value>bar</value>
<data type="string"/>
</choice>
</oneOrMore>
</attribute>
</optional>
<zeroOrMore>
<attribute>
<anyName>
<except>
<name>class</name>
</except>
</anyName>
</attribute>
</zeroOrMore>
<zeroOrMore>
<choice>
<text/>
<ref name="anyElement"/>
</choice>
</zeroOrMore>
</element>
</define>
</grammar>
TBH I have no idea. I assumed that because trang (the tool for converting between rnc and rng) didn't complain the schema was valid. I am trying to combine predefined classes ("foo"
, "bar"
) with any other classes (string
). This is how I thought it is done in RelaxNG. I hope it is possible to do somehow. What happens if you remove <data type="string"/>
?
Actually I think the problem might be the <oneOrMore>
. What if you remove that (and leave the data type="string"
)?
I was hoping the oneOrMore inside attribute would be valid, and you could then use oXygen's auto-completion to insert one or more classes. But I guess it is not valid and so you'll probably only be able to auto-complete one class.
A workaround could be to enumerate all combinations of common classes. For example:
brl-render-ignore lp-render-ignore
brl-render-ignore lp-render-bold
brl-render-emph lp-render-ignore
brl-render-emph lp-render-bold
brl-render-quote lp-render-ignore
brl-render-quote lp-render-bold
brl-render-singlequote lp-render-ignore
brl-render-singlequote lp-render-bold
You can also add the classes that you define in the default (braille) CSS, for example:
foreignword lp-render-ignore
foreignword lp-render-emph
And possibly you can also list the permutations:
lp-render-ignore brl-render-ignore
lp-render-ignore brl-render-emph
lp-render-ignore brl-render-quote
lp-render-ignore brl-render-singlequote
lp-render-ignore foreignword
lp-render-bold brl-render-ignore
lp-render-bold brl-render-emph
lp-render-bold brl-render-quote
lp-render-bold brl-render-singlequote
lp-render-bold foreignword
And finally, because I guess most books are converted to either braille or large print, we should also list only the braille and only the large print classes:
brl-render-ignore
brl-render-emph
brl-render-quote
brl-render-singlequote
foreignword
lp-render-ignore
lp-render-bold
Great! Removing oneOrMore already did the trick: the schema is valid. When I add a class attribute to an element oXygen pops up a list with the values "foo" and "bar". In turn, when I use a value other than "foo" and "bar" the xml document is still valid. It's exactly what we want.
<grammar ns="" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="">
<start>
<ref name="anyElement"/>
</start>
<define name="anyElement">
<element>
<anyName/>
<optional>
<attribute name="class">
<choice>
<value>foo</value>
<value>bar</value>
<data type="string"/>
</choice>
</attribute>
</optional>
<zeroOrMore>
<attribute>
<anyName>
<except>
<name>class</name>
</except>
</anyName>
</attribute>
</zeroOrMore>
<zeroOrMore>
<choice>
<text/>
<ref name="anyElement"/>
</choice>
</zeroOrMore>
</element>
</define>
</grammar>
An important question is also how uniform you want the DTBook and EPUB3/HTML5 markups to be. Did we bring up this issue already? As it looks now the new EPUB3 markup will differ considerably from the old DTBook markup, so what you could do is change the DTBook markup too. Except for the epub:type attributes, everything in the EPUB proposal can be applied to DTBook as well. The new schema could have a new version 2005-3-sbs-full/minimal-2.0
or something.
In addition you could also make a concession in the EPUB markup by not using any epub:types (replace with class, or maybe role?). I personally think using class in DTbook and epub:type in EPUB is not the biggest problem. As long as everything else is uniform it should be workable.
Or maybe you say the difference doesn't matter because in the future you will completely switch to EPUB anyway?
I've just talked with Manfred about that and we both think it is better to not change the current DTBook markup. Sooner or later we will for sure switch to EPUB3 but at the moment nobody knows when exactly this will happen. In view of the oncoming introduction of Braille-in-DAISY-Pipeline in 2017, which is a considerable change for the users, it also makes sense to go step by step and avoid too many changes (markup + formatter) at the same time.
Okay.
An importance remark that was made in our call today is that what we use as authoring format does not need to be standards compliant, as long as what we distribute or exchange with Nordic countries is standards compliant. So it is no problem if the authoring format has really SBS-specific things such as brl:select
in it as long as we remove it when distributing/exchanging. The same can be said about the whole markup. In theory we could have two completely separate types of EPUB. One with all the brl:*
that we are used to in DTBook, and one that is standards compliant, and conversion scripts to go from one format to the other and back.
Test suite works again (https://github.com/sbsdev/pipeline-mod-sbs/issues/52#issuecomment-290651862).
All the existing unit tests pass now. I'm going to merge the sbs-9 branch even though some things might not work yet, and even though the exact EPUB 3 format (see wiki page) hasn't been decided yet.
We can move the issue back to "Backlog" if Mischa finds issues, or if we want to make changes to the EPUB 3 format.
I found some issues in the EPUB3 output. I first created a file as DTB and an identical one as EPUB. Here are the differences I found. Possibly my markup is wrong, please take a look at it. test_epub3_html.zip
Output from DTB Output from EPUB
|
*H:TSV7Z34X | *H:TSV7Z34X
----------- | -----------
|
7]7 B+D | 7]7 B+D
|
TO'CL*E ................ #*A | TO'CL*E ................ #*A
|
ZW3T7 B+D | ZW3T7 B+D
|
TO'CL*E ............... #,,C | TO'CL*E ............... #,,C
|
:::::::::::: | ::::::::::::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*H:TSV7Z34X >I | *H:TSV7Z34X >I
p | p
p | p
HEAD*G VOLUME #A | HEAD*G VOLUME #A
---------------- | ----------------
|
SPAN-+SW7 --- | SPAN-+SW7 ---
SPAN-+SW7'#A - | SPAN-+SW7'#A -
SPAN-BO'X --- | SPAN-BO'X ---
LI-BRL-'CLA^ | LI-BRL-'CLA^ <-- brl:class not working in EPUB (css was specified, but has no effect)
'A-PA&REF #A | 'A-PA&REF #A
BRL-HOMOGRAPH W<]UBE | BRL-HOMOGRAPH W<]UBE
BRL-'V-F?M $S | BRL-'V-F?M S <-- brl:v-form not working in EPUB
BRL-NUM | BRL-NUM <-- brl:num not working in EPUB
'C)D*AL #E | 'C)D*AL #E
?D*AL #? | ?D*AL #E. <-- brl:num not working in EPUB
ROMAN >II. | ROMAN II. <-- brl:num not working in EPUB
PHONE #JDC.CCC.CB.CB | PHONE #JDC !, #CCC #CB #CB <-- brl:num not working in EPUB
ISBN #IGH.C.DIB.BDJGB.G | ISBN #IGH-#C-#DIB-#BDJGB-#G <-- brl:num not working in EPUB
MEASURE #D'DL | MEASURE #D DL <-- brl:num not working in EPUB
FRA'CTJ #C/ | FRA'CTJ #C!,#D <-- brl:num not working in EPUB
MI'XED #H#A; | MI'XED #H #A!,#B <-- brl:num not working in EPUB
BRL-PLA'CE M+NH3M | BRL-PLA'CE M+NH3M
BRL-SYE'CT KZ | BRL-SYE'CT BASIS Q KZ <-- brl:select not working in EPUB
BRL-[PH | BRL-[PH
_[PH | [PH <-- brl:emph not working in EPUB
'(S*GLE'QUO(') | S*GLE'QUO( <-- brl:emph not working in EPUB
('QUO() | 'QUO( <-- brl:emph not working in EPUB
IGN?E | IGN?E
_#I,) R/N*GL*E #A | <-- brl:running-line not working in EPUB (1 was selected)
p | p
p | p
*H:TSV7Z34X | *H:TSV7Z34X
----------- | -----------
|
ZW3T7 B+D | ZW3T7 B+D
|
TO'CL*E ............... #,,C | TO'CL*E ............... #,,C
|
:::::::::::: | ::::::::::::
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*H:TSV7Z34X >I | *H:TSV7Z34X >I
p | p
p | p
HEAD*G VOLUME #B | HEAD*G VOLUME #B
---------------- | ----------------
|
BRL-A'C'CCTS-SPAN R"EDUIT | BRL-A'C'CCTS-SPAN R"EDUIT
D"%TAIQ"% | D"%TAIQ"%
BRL-'COMPUT7 '$WWW.SBS.CH | BRL-'COMPUT7 WWW.SBS.'4 <-- brl:computer not working in EPUB
BRL-DA( #,=AJ#BJJD | BRL-DA( #AG.AJ.BJJD <-- brl:date not working in EPUB
BRL-TIME #E.AE | BRL-TIME #JE":#JE <-- brl:time not working in EPUB
BRL-NAME K1FM+N | BRL-NAME K1FMN <-- brl:name not working in EPUB
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_#AJ,; R/N*GL*E #C |
p | p
OK thanks for the heads up!
@mixa72 What is supposed to happen with brl:class
? As far as I remember this had something to do with macro's in dtbook2sbsform, which I guess would translate to CSS in the new system. If you want to select an element with a brl:class
in CSS you should do it like this:
@namespace brl url(http://www.daisy.org/z3986/2009/braille/);
brl|class~='myclass' {
...
}
OK I see what you are trying to do. You put this in the EPUB:
<style>
@namespace xml "http://www.w3.org/XML/1998/namespace";
@namespace brl url(http://www.daisy.org/z3986/2009/braille/);
li[brl|class='myclass'] {
margin-left:2;
}
</style>
The problem is that this CSS is not enabled unless you specify the "apply-document-specific-stylesheets" option (why is currently not available in the SBS version of the script).
@bertfrees Thanks for the hint with the syntax. However, it appears that any css instruction in the style Element is ignored by the system. I even tried
@namespace xml "http://www.w3.org/XML/1998/namespace";
@namespace brl url(http://www.daisy.org/z3986/2009/braille/);
li{
margin-top:2 !important;
}
but nothing changes. Is that possible?
Well, there are two problems. Firstly, like I said above you need the "apply-document-specific-stylesheets". (I will add it.) Secondly, you need to add type="text/css"
to the style
to make it work. (Preferably also add media="embossed"
to make the style not influence the rendering on screen).
I seem to understand it now: as "apply-document-specific-stylesheets" is disabled now, I'll have to test the brl:class attribute via external stylesheet (scss), right?
Indeed.
However I think there is another issue, which might also explain why the elements like brl:v-form, brl:num etc. don't work. I'm investigating it now.
Never mind, forget that last comment.
OK so I've added the "apply-document-specific-stylesheets" option and that solves the brl:class
issue.
All the other issues are because brl: elements are not valid in HTML and as a result the prefixes are removed in the load step. (brl: attributes are also invalid but here the prefixes are retained). A solution is to make the translator and the style sheets work regardless of whether the "brl:" prefix is present. But better is of course to create valid HTML, for example by using epub:type or class attributes.
Another issue I found in your EPUB is that it uses <list type="pl">
. In EPUB use <ul style="list-style-type: none">
instead. NLB has a "list-style-type-none" class for it:
.list-style-type-none {
list-style-type: none;
}
OK. I'll adjust my EPUB accordingly. Thanks!
BTW is the apply-document-specific-stylesheets option also visible in the GUI or just available in the background?
Yes it will be visible in the GUI.
Done.
I had to make some small adjustments to the EPUB in order to make it behave exactly as the DTBook: see chapter.xhtml.
EPUB3 to PEF Conversion works now. All the above mentioned inline elements are translated as in the DTB to PEF Conversion. CSS Support for stylesheets inside EPUB3 also works. Thanks.
The embedded braille rendition from the EPUB3 to EPUB3 conversion differs a bit from the output in the PEF in that some inline elements are not translated accordingly: brl:num (ordinal, phone, isbn, measure, fraction, mixed) em (strong) (brl:emph) brl:date brl:time brl:name The brl:select element should only render the braille in the corresponding grade (not each literal element). The rest looks good.
If SBS also intends to use EPUB3 as input format, pipeline-mod-sbs should also have an epub3-to-pef script and corresponding HTML translator.