nlbdev / pipeline

NLB branch of the super-project that aggregates all Pipeline related code. See https://github.com/daisy/pipeline for the main branch.
http://repo.nlb.no/pipeline
3 stars 1 forks source link

Whitespace normalization with pagenum element #165

Closed josteinaj closed 6 years ago

josteinaj commented 6 years ago

Maybe this is a a problem with all nested inline elements, I haven't tested it. But these tests broke somewhere between 7060067b89406cd8bf24956bb17ac75c0f89af99 and cdbff2c937cea2d605a4ff9300a02a0d066832f9:

https://github.com/nlbdev/pipeline/blob/7e50739ecc8d9507590cfd40a8df53913836eb16/modules/nlb/book-to-pef/src/test/xprocspec/test_dtbook-to-pef.xprocspec#L848

https://github.com/nlbdev/pipeline/blob/7e50739ecc8d9507590cfd40a8df53913836eb16/modules/nlb/book-to-pef/src/test/xprocspec/test_dtbook-to-pef.xprocspec#L919

The problem is the same in both tests: the whitespace before the pagenum is removed so that the word before and after the pagenum are joined into one.

Input (DTBook):

fortauet med den <pagenum id="p11" page="normal">11</pagenum>lille hunden,

Expected output:

⠿⠀⠋⠕⠗⠞⠁⠥⠑⠞⠀⠍⠑⠙⠀⠙⠑⠝⠀⠇⠊⠇⠇⠑⠀⠓⠥⠝⠙⠑⠝⠂
é⠀fortauet⠀med⠀den⠀lille⠀hunden,

Actual output:

⠿⠀⠋⠕⠗⠞⠁⠥⠑⠞⠀⠍⠑⠙⠀⠙⠑⠝⠇⠊⠇⠇⠑⠀⠓⠥⠝⠙⠑⠝⠂
é⠀fortauet⠀med⠀denlille⠀hunden,
KariRudjord commented 6 years ago

Works.