Closed dkager closed 8 years ago
Thanks for the python file.
@dkager Maybe this is another thing you can look into when you have time?
Would this go into DTBook-to-PEF, since it’s already generating a BRF?
I thought it could go in dotify-formatter and pef-utils (pef:store). dotify-formatter would add metadata to the PEF, pef-utils would use this metadata to select the right character set for the BRF, as explained in https://github.com/daisy/pipeline-mod-braille/issues/47.
But on second thought, choosing the character set is maybe not really a formatter task. It does make sense that a (custom) formatter sets a default, but it shouldn't be able to overrule the user's choice.
So let's go for a dedicated "ascii-table" option in dtbook-to-pef (and zedai-to-pef and html-to-pef). OK?
Using pef-utils to output...brf. Maybe brf-utils?
What do you think the table format should be, something in XML? Or using liblouis to translate Unicode braille to ASCII? Then it seems easier to have a dedicated track for BRF that adds e.g. nl-NL.dis to the liblouis table list. But maybe that's overkill since the conversion from PEF is well-defined.
pef-utils is a collection of tools that convert from/to PEF / Unicode braille, and the conversion to BRF is also a part of that. pef-utils is basically a wrapper around Joel's brailleutils.
The table format can be anything. All that a table needs to do is implement the org.daisy.braille.api.table.Table
interface, and it should be query-able by a org.daisy.pipeline.braille.pef.TableProvider
. I have already created a liblouis-backed TableProvider, so you could use a nl-NL.dis if you want. Note that maybe you can use an existing implementation in brailleutils.
Doing the conversion to ASCII braille could be done in a single pass, by adding a display table, like you're suggesting. However that's not interesting for us for several reasons: 1. We always want to go through PEF because of the great portability of PEF. 2. we also want to support other types of translators beside Liblouis.
Maybe point brf-table
to your liblouis TableProvider and introduce another option to specify the actual table? It seems all agencies are comfortable with using liblouis so that would be a logical extention. Is your liblouis TableProvider up somewhere so I can have a look?
Or query org.daisy.braille.impl.table.DefaultTableProvider
for the right table maybe, since these tables won't likely change over time so they can be 'hard-coded'.
Note that tables are not just country-specific, they can also be printer-specific. So using xml:lang
or locale
isn't a good idea after all.
No need for two options. All available providers are used when selecting the table in pef:store (pef:pef2text). See https://github.com/daisy/pipeline-mod-braille/blob/master/pipeline-braille-utils/pef-utils/pef-calabash/src/main/java/org/daisy/pipeline/braille/pef/calabash/impl/PEF2TextStep.java.
Liblouis backed tables can be queried with e.g. (liblouis-table:'nl.dis')
.
The tables in org.daisy.braille.impl...
can be queried with e.g. (id:'org.daisy.braille.impl.table.DefaultTableProvider.TableType.EN_US')
.
Using xml:lang sounds good enough for a default.
Using xml:lang sounds good enough for a default.
But the Braillo table may differ from the Index (though the latter can print PEF out-of-the box I believe). I don't think every locale specifies an ASCII braille standard. For example, the Dutch number sign is dots 3456, which you'd expect to appear as #
in a BRF-file. But on Dedicon's printers you actually have to use %
.
Added the NL Braillo table in joeha480/brailleutils#62. Now what is left is to use the com_braillo.BrailloTableProvider
. Should probably be an option in the (web) UI.
Order of precedence (highest to lowest):
ascii-table
.<dp2:ascii-table/>
PEF metadata field.xml:lang
. Probably pass it on directly. Or should we do xml:lang="de"
--> de.dis
? The value of locale
should override this, so they boil down to the same internal variable.If (1) and (2) are empty, I'd say (3) and (4) can be implemented by using (locale:foo)
as the query, and creating a TableProvider that supports locale and defaults to en-US.
The DefaultTableProvider has en-US and seems to be the logical point of extension for (4).
WRT (3):
How should non-existent table IDs be handled? E.g. you set xml:lang="de"
but no table with ID de
exists. Right now this causes a conversion error. Better would be to fall back to en-US, but my guess is that this needs a change in brailleutils. To somewhat mitigate the issue it's probably best to only search the default table provider based on xml:lang
. But again this s problematic because if xml:lang="en"
this could be en-US or en-GB or en-IN, etc.
Right. Currently an error "Could not find a table for query: ..." is thrown. Instead of throwing an error we can just use the brailleutils default (en-US). See PEF2TextStep.java#L93. Just replace this line with e.g.
table = tableProvider.get("(id:'org.daisy.braille.impl.table.DefaultTableProvider.TableType.EN_US')").iterator().next();
By the way the query shoud not be (id:de)
but (locale:de)
.
Do we want this for html-to-pef and zedai-to-pef as well?
Yes :)
Should be all done now and ready for merge.
Thanks. See also PR: https://github.com/daisy/pipeline-mod-braille/pull/56.
Before I merge we should probably make a release of braille-utils.impl. @joeha480? (see https://github.com/joeha480/brailleutils/pull/62).
Also I'm still not convinced that LiblouisDisplayTableProvider should not support the "locale" feature. If someone adds a display table and claims it contains the ASCII charset for a certain language it should be possible to use it.
Liblouis responding to (locale:foo) is OK, but what if the table it finds (manifest/nl) isn’t the one I want? That is my only concern.
I understand. But I think this is how providers should work. If a table matches your query, it means it can be used. If you want to be absolutely sure a specific table is selected, you should make the query more specific.
FWIW, manifest/nl will never be selected because the file name does not end with ".dis".
In the test log I saw it was found, not sure if it was used. Feel free to add it back and we’ll see what it does. Maybe the liblouis (locale:foo) matching needs a bit of beefing up.
Rebased on master and cleaned up.
I think liblouis-core also matches:
#+locale:nl
E.g. in the manifest files. And if it doesn't, it probably should, else this tag has little use. The question is what this table (and the nl-NL-g0.utb table that should back it)generates in terms of ASCII braille. I don't expect liblouis to adopt the code used by the Braillo embosser table, hence I added a new provider.
To summarize, my dilemma is that on one hand I don't expect users to provide the brf-table="(id:foo)"
query for every translation. But on the other hand I want brf-table="(locale:nl)"
to map to a very specific table. We probably shouldn't have two providers matching the locale query in the system. Or if we do, there should be a way of prioritizing one above the other. The newly added locale mapper is very specific, liblouis is more generic. So maybe:
liblouis-locale:foo
.liblouis-locale
.@bertfrees What do you think? We could also do something agency-specific but any solution that has two locale mappers is a bit hacky.
Hum, on further inspection it appears liblouis checks manifests only when translating into braille, and at some point it loads all manifests and gets really spammy. I guess matching only *.dis is fine then. At least it doesn't interfere with LocaleTableProvider. :-)
I have another question. Why does LiblouisDisplayTableProvider contain this code?
for (String feature : query.keySet())
if (!supportedFeatures.contains(feature))
return empty;
Is the idea that either liblouis parses the entire query or nothing at all? In particular, what if you have brf-table="(locale:nl)(line-breaks:DOS)"
? Is this code still accurate?
I understand your concerns from a "predictability" point of view. If multiple instances match some query you can't know for sure one is chosen over the other. The first provider that presents itself wins. I chose this (initial) approach because it is simple and easy to understand. The rules are:
This model is simple and extensible. I've thought about other things such as priorities, fuzzy matching etc. (e.g. one idea was to give providers that match less features a higher priority; this is how it's done in lou_findTable by the way) but it makes everything so much more complicated. We can of course consider extending the model, but only if it brings us something apart from complexity.
my dilemma is that on one hand I don't expect users to provide the (brf-table:id) query for every translation. But on the other hand I want (brf-table:'locale:nl') to map to a very specific table.
I don't think that is a very reasonable thing to expect. A general query such as (locale:foo)
can't match a very specific table, I think that's logical. Why not just change it to (locale:foo)(default)
for example if you want to be sure your LocaleTableProvider provider is picked and if you don't like to use (id:...)
?
We probably shouldn't have two providers matching the locale query in the system. [...] any solution that has two locale mappers is a bit hacky
I think in general if it is clear that one implementation is superior the another and we wish to always select the superior one, in practice we simply won't include the other one. So there is no conflict. If it is not clear which one is superior (one might be more appropriate in some cases, the other in other cases) it's not so bad to make the query more specific.
Is the idea that either liblouis parses the entire query or nothing at all? In particular, what if you have
brf-table="(locale:nl)(line-breaks:DOS)"
? Is this code still accurate?
The "line-breaks" part is handled by pef2text, LiblouisDisplayTableProvider will only see the rest of the query, namely (locale:nl)
.
To clarify, in my opinion the system should provide sensible defaults. In particular, for Dedicon:
Chances that we will want to override any of the above on a book-to-book basis are very small. So the user shouldn't have to pass these to the system in every conversion, especially not those parameters that can be computed from xml:lang
. Furthermore, the notion that there are two things that both claim to implement locale:nl
yet that are not identical in output baffles me a bit. Of course since there is no nl.dis
table in liblouis that provider doesn't match, so all is good. Just wanted to document my thoughts for later.
Regarding supportedFeatures
: I'm worried this will cause problems if there is ever going to be another component inserted after the TableProvider. Say this component accepts (foo:bar)
, so LiblouisDisplayTableProvider receives (locale:nl)(foo:bar)
. In this case the liblouis table provider is not used. In other words, is there added value in rejecting the query this way instead of partially consuming it, keeping in mind that you don't want to add any fuzzy logic.
Of course since there is no
nl.dis
table in liblouis that provider doesn't match, so all is good.
Right. And if there were multiple implementations with differences we would figure out why and make the match functions more specific, e.g. nl-BE vs nl-NL.
Regarding supportedFeatures [...]
A provider is completely responsible for what it returns based on a query. It can use "subproviders" that provide subcomponents, but breaking down the query and using the right parts for the right subproviders is the responsibility of the root provider.
So, in your example, a provider of a hypothetical component that internally uses a Table and some other component that matches (foo:bar)
, could e.g. only use locale
for the Table subprovider and foo
(or both foo
and locale
, or first try foo
and locale
, then only foo
, or ...) to select the other component.
If a provider would return something for a query that it only partially matches there is the possibility that it is used for something it shouldn't be used for.
Again the reason for doing it like this is to keep the model simple. The model can be extended to manage partial matches in a controlled way (I have considered this) but it would add a lot of complexity so we should think carefully about whether we really need it.
The NL Braillo table still needs some supplement characters to better match Dedicon's current converter.
Support the generation of a BRF-file that will lead to the correct output when used with a printer and a particular braille table. This allows existing conversion software that outputs BRF-files to be used alongside the DP2 system without changing braille tables on the printer.