nlbdev / pipeline-mod-nlb

NLB specific modules for the DAISY Pipeline 2
0 stars 0 forks source link

For e-mails and URLs, use mathematical hyphenation when splitting over two lines #5

Closed josteinaj closed 7 years ago

josteinaj commented 9 years ago

Also, the normal hyphenation rules should not be applied as those might insert additional characters.

bertfrees commented 9 years ago

Does that mean a different hyphenation character, and only insert it if the words are too long to fit on a line?

bertfrees commented 9 years ago

OK. I will need to implement the hyphenate-character property. And maybe also make a new value for hyphens that means "none + insert a hyphenate-character when a word is split because it's too long to fit on a line".

This is another reason to handle e-mails and URLs in a higher level translator (see also https://github.com/snaekobbi/pipeline-mod-nlb/issues/4). It should probably be done in XSLT because styles need to be added (that are processed later in the process, during layout). The alternative is to implement the DotifyTranslator interface and use it for translation while formatting.

josteinaj commented 9 years ago

Does that mean a different hyphenation character, and only insert it if the words are too long to fit on a line?

Yes.

bertfrees commented 9 years ago

Technically, deferring translation of URLs to the formatting phase comes down more or less to the same thing as the hyphenate-character option because we would need to tell Dotify with a style element that a certain text segment is a URL. That is, assuming we don't want to do the URL recognition step twice. For the hyphenate-character option either we need a new dedicated Dotify attribute, or we could use the style element for it as well.

Handling this in XSLT would mean we would also have to move the URL recognition etc. to XSLT, but this better stays in Java IMO. Therefore we should probably use the forthcoming pf:transform Saxon function that works on trees instead of string sequences.

If it's OK to do the URL recognition step twice, we can just defer and don't have to worry about any of this. Note however that when deferring, the quality of the translation could suffer because of loss of context.

josteinaj commented 9 years ago

Is the context the surrounding text, where in the document the URL occurs, what CSS is applied etc? I don't think the translation would change depending on the context.

@KariRudjord: We need to discuss whether or not mathematical hyphenation is appropriate for URLs. Mathematical hyphenation is dot 6, the same as the upper-case indicator, so it assumes that all URLs are case-insensitive, which they usually are on the web but not necessarily.

bertfrees commented 9 years ago

Is the context the surrounding text, where in the document the URL occurs, what CSS is applied etc? I don't think the translation would change depending on the context.

Yes. I meant it more in general. It could be that Norwegian braille translation is less context dependent.

In some other braille codes you have rules like: depending on the length (in words) of a bold passage it is indicated differently. If in that case you are going to defer or isolate the translation of certain words you get into trouble quickly.

josteinaj commented 9 years ago

Right, I see. Let's assume that norwegian URLs are context-independent.

@KariRudjord: please correct me if I'm wrong.

KariRudjord commented 9 years ago

Yes, the URLs are context-independent.

josteinaj commented 8 years ago

I don't know the status here, should we test this?

KariRudjord commented 8 years ago

I think it is a too small thing to be tested now. There are lots of smaller things that could be tested, but it would occupy much time.

bertfrees commented 8 years ago

@josteinaj status = to do

josteinaj commented 8 years ago

@KariRudjord Ok, that's fine.

@bertfrees Ok, thanks.

bertfrees commented 8 years ago

There's a couple of things that need to happen in mod-braille before you can implement it in mod-nlb. I was planning to do the things in mod-braille this week.

josteinaj commented 8 years ago

Ok, no worries, was just wondering the status since there were no updates on this since october :)

bertfrees commented 8 years ago

I'll start by explaining how this can be achieved. It should be relatively easy now after the big change I did for supporting non-standard hyphenation.

josteinaj commented 7 years ago

This issue was moved to nlbdev/pipeline#3