snaekobbi / pipeline-mod-dedicon

Dedicon specific modules for the DAISY Pipeline 2
0 stars 0 forks source link

Symbols list #28

Open dkager opened 8 years ago

dkager commented 8 years ago

How does Dotify determine which content belongs to a volume? That is, would a volume-level symbols list be possible if it was generated as boilerplate BEFORE being fed into dtbook-to-pef?

bertfrees commented 8 years ago

Dotify splits the document body (in CSS terms: everything in the normal flow, i.e. not flowed into @begin or @end areas; in OBFL terms: everything not inluded in pre-content or post-content) into equal parts. Table of contents and volume endnotes sections work as follows: they must be included in pre-content or post-content, and they are basically lists of items that reference elements in the document body. This way Dotify can determine which items need to be included in which volumes, and in case of a "document range" table of contents, how to group the items according to volume.

Volume-level symbols list could be done similar to how endnotes are done, that is: before formatting you generate a list of "symbol items" that reference the positions in the body where they are used.

However one issue is that I'm not sure whether OBFL allows the same item to reference multiple positions in the body, and what Dotify does in that case. What would be most logical is to include the item in as many volumes as needed, and that is also exactly what we need. (As a matter of fact, this may also be an issue with endnotes. Can several noterefs in DTBook reference the same note? I'm not sure.)

Whether the order of items in the result is determined by the order of items in the collection, or by the order of the referenced elements in the body, I don't know, because these orders have always been identical until now. This could be another possible issue if we want to use this approach for symbols lists.

dkager commented 8 years ago

Yesterday I learned that the symbols list is book-level. Who knew?! So let's start with that.

My initial thoughts:

The problem I see with this approach is that the symbols file uses ASCII for the replacements, which should be inserted as-is. I need to look into that.

My initial idea was to add a liblouis table to translate all the symbols, but this makes maitenance more difficult because the symbols file is also used in other systems.

bertfrees commented 8 years ago

Regarding the as-is insertion of ASCII braille, that should be possible but it will probably involve a translation to Unicode braille because the result needs to be PEF. Related issue: https://github.com/snaekobbi/issues/issues/9

dkager commented 8 years ago

To clarify, the ASCII is the BRF output that the Braillo needs. So either we'd have to back-translate it to Unicode in the pre-processing step, or this can be done based on the ascii-table option. In the last case the Braillo table needs more supplements to guarantee every replacement can be back-translated. Basically the end result needs to be that these are passed on from input to BRF without change. Since DP2 works with PEF that is probably too optimistic. I can provide a symbols file with Unicode braille, but that breaks compatibility with our legacy system.

dkager commented 8 years ago

I discussed this with the product manager. Our decision is not to implement the symbols list in phase 3, making it out of scope for the project.

bertfrees commented 7 years ago

I'm reopening this issue because we're using the tracker now for the backlog of issues that need to be fixed in the project follow-up.

bertfrees commented 7 years ago

According to Arjan the symbols list is volume-level, not book-level like Davy said above.

In our current conversion we use a symbols list to convert certain characters. We also put this character in each volume in a paragraph "Symbols list" as an additional declaration. Take as an example the ampersand (&) sign.

dkager commented 7 years ago

According to Arjan the symbols list is volume-level, not book-level like Davy said above.

Tests with the current conversion software show that this depends on the book type (RO or SV).