skynav / ttt

Timed Text Toolkit
BSD 2-Clause "Simplified" License
74 stars 15 forks source link

Always emit line break BreakPhrase when processing a <br/> #307

Closed MaxEliaserAWS closed 3 years ago

MaxEliaserAWS commented 3 years ago

Hello, Skynav team. I, along with several of my colleagues, have been working on a project that uses TTT for quite some time now. Now that it's in production, we want to contribute back all the changes we made for our use.

I want to extend the gratitude of AWS. We have found TTT very useful in AWS Elemental MediaConvert, powering the "style passthrough" feature documented here: https://docs.aws.amazon.com/mediaconvert/latest/ug/burn-in-output-captions.html

There are more patches where this came from, but I figured I'd start with a simple one to introduce myself and say hello. This bugfix was originally authored by my colleague Nathan Hebert.

Commit message follows:

Always emit line break BreakPhrase when processing a <br/>

The following pattern would previously resolve the BR into a single whitespace:

<p>
  <span>This is the first line.</span>
  <br/>
  <span>This is the second line.</span>
</p>

This is because, the "\n " text element would be concatenated with another "\n" from the <br/> element, then concatenated with the following "\n " text into a single text phrase. The default text processing rules allow for TTML allow for mid-p repetitive whitespace text entities to be interpreted as a single space.

Change

When the PhraseCollector gets a <br/> element, we emit the previous phrase (if needed) and emit a BreakPhrase instance of type "line break". This tells the paragraph collector emit a paragraph.

MaxEliaserAWS commented 3 years ago

Since GitHub won't let me comment on binary files, here is my justification for all changed unit test "expected.zip" files.

ttml2-prstn-length-root-container-relative-imsc11-5

The original for this test case is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/ttml/lengthRootContainerRelative/lengthRootContainerRelative005.ttml And the reference rendering is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/png/lengthRootContainerRelative005/0.000000.png

In order to make this one clear, I had to hand-edit both the "old" and "new" SVG output files. For both of them, all but the first line of text is vertically positioned far off the bottom of the screen. That may be a separate issue, but it's not the one I'm focused on here.

The input file for this test contains two lines of text separated by a manual linebreak (<br> element.) However, the first line of text is too wide to fit within the region, so this test case ought to display three lines total.

In the upstream "expected" result (bottom,) the line wrapping works as expected, but the manual linebreak is being ignored. In the "new" rendition (top,) the manual linebreak is having its intended effect.

There are still other issues here; the aforementioned vertical spacing, and the fact that the text shadow is being ignored, but this "new" result is still a better representation of the input file and closer to the reference rendering from W3C. One of our subsequent patches will address the missing shadow at least. ttml2-prstn-length-root-container-relative-imsc11-5_comparison

ttml2-prstn-text-shadow-imsc11-1

This is a very similar case to ttml2-prstn-length-root-container-relative-imsc11-5. I did the same hand-editing of the SVG files to make sure, but I will not recite an identical analysis here.

ttml2-prstn-ruby-align-imsc11-4

The original for this test is case is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/ttml/rubyAlign/rubyAlign001.ttml And the reference rendering is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/png/rubyAlign001/0.000000.png

This test case has multiple frames, but only the first frame renders differently after the fix has been applied.

The input file for this test contains a line of Japanese characters, followed by a manual linebreak (<br> element,) followed by a line of Latin characters. The "textAlign" setting is "center" for both lines.

In the upstream "expected" results (bottom,) the Japanese text is ever so slightly off-center, whereas in the "new" rendition (top,) the Japanese text is precisely centered. A red line has been added to make the alignment difference clear. Looking at the changed results in the debugging XML output format, the "old" output has two extra space characters after the Japanese text, which nudge it farther to the left than it should be. The combination of Japanese characters plus space characters is centered, with 405.248 pixels on either side of it, but because the space characters are invisible, the Japanese characters look off-center.

--- ttpx000001.xml  2021-10-01 15:52:46.192936294 -0700
+++ ttpx000001.xml.new  2021-10-01 15:52:46.197936427 -0700
@@ -9,7 +9,7 @@
             <block bpd="135" ipd="1024">
               <block bpd="90" ipd="1024">
                 <line bpd="90" ipd="1024">
-                  <fill ipd="405.248"/>
+                  <fill ipd="416"/>
                   <annotation bpd="30" ipd="192">
                     <fill ipd="36"/>
                     <glyphs ipd="24" text="ラ"/>
@@ -20,8 +20,7 @@
                     <fill ipd="36"/>
                   </annotation>
                   <glyphs ipd="192" text="利用許諾"/>
-                  <space ipd="21.504" text="\u0020\u0020"/>
-                  <fill ipd="405.248"/>
+                  <fill ipd="416"/>
                 </line>
               </block>
               <block bpd="45" ipd="1024">

The reference rendition from W3C also has the Japanese characters precisely centered, so the "new" rendition is a more correct representation of the source file, and closer to the reference rendition from W3C.

ttml2-prstn-ruby-align-imsc11-4_comparison

ttml2-prstn-ruby-imsc11-3

The original for this test case is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/ttml/ruby/ruby003.ttml And the reference rendering is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/png/ruby003/0.000000.png

The input file for this test contains three lines of text separated by manual breakpoints. There is first a line of Latin text, then a line of Japanese characters, then the third line is also Latin. In the upstream "expected" result (bottom,) the first Latin line and the Japanese line are displayed on the same Line, with the third Latin line still being displayed on its own line. This indicates that the <br> elements aren't being respected and automatic linebreaking is flowing the text instead. The "new" rendition (top) is a more correct representation of the source file and more closely resembles the reference rendering from W3C as well.

ttml2-prstn-ruby-imsc11-3_comparison

ttml2-prstn-ruby-position-outside

The source file for this test case is here: https://github.com/skynav/ttt/blob/master/w3c-tests/ttml2-tests/presentation/valid/ttml2-prstn-ruby-position-outside.xml

The input file for this test contains two identical lines of Japanese characters, with a manual break inserted between them. In the upstream "expected" result (bottom,) all characters are on a single line, indicating that the <br> element isn't being respected. The "new" rendition (top) is a more correct representation of the source file.

ttml2-prstn-ruby-position-outside_comparison

ttml2-prstn-text-emphasis-imsc11-1

The original ror this test case is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/ttml/textEmphasis/textEmphasis001.ttml And the reference rendering is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/png/textEmphasis001/0.000000.png

The input file for this test contains two identical lines of Japanese characters, with a manual break inserted between them. In the upstream "expected" result (bottom,) two characters from the bottom line are being moved to the top line, indicating that the <br> element isn't being respected and automatic linebreaking is flowing the text instead. The "new" rendition (top) is a more correct representation of the source file, and more closely resembles the reference rendering from W3C as well.

ttml2-prstn-text-emphasis-imsc11-1_comparison

ttml2-prstn-text-emphasis-imsc11-3

The original for this test case is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/ttml/textEmphasis/textEmphasis003.ttml And the reference rendering is here: https://github.com/w3c/imsc-tests/blob/main/imsc1_1/png/textEmphasis003/0.000000.png

The input file for this test contains two identical columns of Japanese characters, with a manual break inserted between them. The is oriented vertically using the "tbrl" writingMode setting. In the upstream "expected" result (bottom,) two characters from the left column are being moved to the right column, indicating that the <br> element isn't being respected and automatic linebreaking is flowing the text instead. The "new" rendition (top) is a more correct representation of the source file, and more closely resembles the reference rendering from W3C as well.

ttml2-prstn-text-emphasis-imsc11-3_comparison

skynavga commented 3 years ago

Hi @MaxEliaserAWS and thank you for this PR and your in-depth explanation. We will take a close look at this at our first opportunity and will likely have comments or questions to follow. G.

MaxEliaserAWS commented 3 years ago

Just giving this PR a friendly poke :)

skynavga commented 3 years ago

@MaxEliaserAWS I've sent you an invite with Write Access. Please proceed with this PR. Welcome to the team.

MaxEliaserAWS commented 3 years ago

Awesome, I'll hit the "go" button as soon as I receive the invite. I will continue to seek review/feedback on my PRs.