Open josteinaj opened 7 years ago
From @bertfrees on September 11, 2015 7:39
Does that mean a different hyphenation character, and only insert it if the words are too long to fit on a line?
From @bertfrees on September 11, 2015 7:57
OK. I will need to implement the hyphenate-character
property. And maybe also make a new value for hyphens
that means "none
+ insert a hyphenate-character
when a word is split because it's too long to fit on a line".
This is another reason to handle e-mails and URLs in a higher level translator (see also https://github.com/snaekobbi/pipeline-mod-nlb/issues/4). It should probably be done in XSLT because styles need to be added (that are processed later in the process, during layout). The alternative is to implement the DotifyTranslator interface and use it for translation while formatting.
Does that mean a different hyphenation character, and only insert it if the words are too long to fit on a line?
Yes.
From @bertfrees on October 5, 2015 19:15
Technically, deferring translation of URLs to the formatting phase comes down more or less to the same thing as the hyphenate-character option because we would need to tell Dotify with a style element that a certain text segment is a URL. That is, assuming we don't want to do the URL recognition step twice. For the hyphenate-character option either we need a new dedicated Dotify attribute, or we could use the style element for it as well.
Handling this in XSLT would mean we would also have to move the URL recognition etc. to XSLT, but this better stays in Java IMO. Therefore we should probably use the forthcoming pf:transform
Saxon function that works on trees instead of string sequences.
If it's OK to do the URL recognition step twice, we can just defer and don't have to worry about any of this. Note however that when deferring, the quality of the translation could suffer because of loss of context.
Is the context the surrounding text, where in the document the URL occurs, what CSS is applied etc? I don't think the translation would change depending on the context.
@KariRudjord: We need to discuss whether or not mathematical hyphenation is appropriate for URLs. Mathematical hyphenation is dot 6, the same as the upper-case indicator, so it assumes that all URLs are case-insensitive, which they usually are on the web but not necessarily.
From @bertfrees on October 6, 2015 9:58
Is the context the surrounding text, where in the document the URL occurs, what CSS is applied etc? I don't think the translation would change depending on the context.
Yes. I meant it more in general. It could be that Norwegian braille translation is less context dependent.
In some other braille codes you have rules like: depending on the length (in words) of a bold passage it is indicated differently. If in that case you are going to defer or isolate the translation of certain words you get into trouble quickly.
Right, I see. Let's assume that norwegian URLs are context-independent.
@KariRudjord: please correct me if I'm wrong.
From @KariRudjord on October 6, 2015 10:44
Yes, the URLs are context-independent.
I don't know the status here, should we test this?
From @KariRudjord on March 15, 2016 12:3
I think it is a too small thing to be tested now. There are lots of smaller things that could be tested, but it would occupy much time.
From @bertfrees on March 15, 2016 12:7
@josteinaj status = to do
@KariRudjord Ok, that's fine.
@bertfrees Ok, thanks.
From @bertfrees on March 15, 2016 12:14
There's a couple of things that need to happen in mod-braille before you can implement it in mod-nlb. I was planning to do the things in mod-braille this week.
Ok, no worries, was just wondering the status since there were no updates on this since october :)
From @bertfrees on June 10, 2016 11:55
I'll start by explaining how this can be achieved. It should be relatively easy now after the big change I did for supporting non-standard hyphenation.
NLBTranslator
whether text contains a URL and if so, return the untranslated text. The same principle is used in LiblouisTranslatorJnaImplProvider.java in order to defer translation of non-standard hyphenated words. The untranslated text is already handled fine in block-translate.xsl.NLBTranslator
, which is invoked to translate untranslated text during the formatting phase, URLs will be detected a second time. This time they can be handled. Instead of using the grade0Translator
, you'll have a third sub-translator that you'll use only for URLs. The query for this sub-translator will look something like this: (liblouis-table:'http://www.nlb.no/liblouis/no-no-g0.utb')(hyphenator:none)(hyphenate-character:'x')
. Note the new "hyphenate-character" feature and also that "hyphenator" is "none" so that URLs will only be broken if they are too long for the line.LiblouisTranslatorJnaImplProvider
. The feature will be parsed at #L172 and turned into a parameter that should eventually be passed on to DefaultLineBreaker at LiblouisTranslatorJnaImplProvider#L343.
From @josteinaj on September 11, 2015 7:25
Also, the normal hyphenation rules should not be applied as those might insert additional characters.
Copied from original issue: nlbdev/pipeline-mod-nlb#5