sandflow / imscJS

JavaScript library for rendering IMSC Text and Image Profile documents to HTML5
BSD 2-Clause "Simplified" License
84 stars 31 forks source link

Improve performance for large input files #212

Closed nigelmegitt closed 3 years ago

nigelmegitt commented 3 years ago

When an input file is large, in some cases it may have both a large number of resulting ISDs and a large number of regions. The ISD processing code currently iterates through all of the body content for every region, even though much of it falls outside the desired ISD's interval. This has been seen to multiply up to impact performance adversely with some real world files.

This pull request modifies generateISD() to reduce the number of combinations, by pre-filtering the regions and content to remove anything that falls outside the required interval before iterating through the regions.

Empirically, this has hugely improved the speed. We have not observed any ill-effects, nor thought of any that might arise.

palemieux commented 3 years ago

For a given temporal offset, the algorithm, by walking down the body element, :

(a) records what regions are active at the offset (b) prunes elements that are not active at the offset

The algorithm then iterates only over the active regions and over the prunes body.

@nigelmegitt Did I get this right?

nigelmegitt commented 3 years ago

@nigelmegitt Did I get this right?

Yes, exactly right.

palemieux commented 3 years ago

@nigelmegitt The renders differs from the reference renders in the following cases:

"Overflow004-10.000000.png"
"Overflow005-10.000000.png"
"Padding001-10.000000.png"
"Padding002-10.000000.png"
"Padding003-15.000000.png"
"Padding004-20.000000.png"
"Padding006-10.000000.png"
"Padding007-10.000000.png"
"padding-four-values-001-10.000000.png"
"padding-one-value-001-10.000000.png"
"padding-three-values-001-10.000000.png"
"padding-two-values-001-10.000000.png"
"progressivelyDecodable1-0.000000.png"
"progressivelyDecodable1-5.000000.png"
"referenceFonts1-9.000000.png"
"ShowBackground001-12.000000.png"
"TextOutline005-10.000000.png"
"WrapOption001-10.000000.png"
"WrapOption002-10.000000.png"
"WrapOption003-10.000000.png"
"WrapOption004-10.000000.png"
"WrapOption005-10.000000.png"
"WritingMode001-10.000000.png"
"WritingMode002-10.000000.png"
"WritingMode003-10.000000.png"
"WritingMode004-10.000000.png"
"WritingMode005-10.000000.png"
"WritingMode006-10.000000.png"
"WritingMode007-10.000000.png"
"WritingMode008-10.000000.png"
"WritingMode009-10.000000.png"
"WritingMode010-1.000000.png"
"ActiveArea001-6.000000.png"
"aspectRatio1-0.000000.png"
"aspectRatio1-9.000000.png"
"aspectRatio2-0.000000.png"
"aspectRatio2-9.000000.png"
"aspectRatio5-0.000000.png"
"aspectRatio5-9.000000.png"
"BeginEnd002-0.000000.png"
"BeginEnd002-1.000000.png"
"BeginEnd002-2.000000.png"
"BeginEnd002-3.000000.png"
"BeginEnd002-4.000000.png"
"BeginEnd002-5.000000.png"
"BeginEnd002-6.000000.png"
"BeginEnd002-7.000000.png"
"BeginEnd002-8.000000.png"
"BeginEnd002-9.000000.png"
"BeginEnd002-10.000000.png"
"BeginEnd002-11.000000.png"
"BeginEnd002-20.000000.png"
"DisplayAlign001-10.000000.png"
"DisplayAlign002-10.000000.png"
"DisplayAlign003-10.000000.png"
"Extent001-0.000000.png"
"Extent001-10.000000.png"
"Extent002-0.000000.png"
"Extent002-10.000000.png"
"forcedDisplay1-0.000000.png"
"forcedDisplay1-9.000000.png"
"forcedDisplay1-forced-0.000000.png"
"forcedDisplay1-forced-9.000000.png"
"multiRowAlign1-0.000000.png"
"multiRowAlign1-9.000000.png"
"Opacity001-10.000000.png"
"Opacity002-10.000000.png"
"Opacity003-10.000000.png"
"Opacity004-0.000000.png"
"Opacity004-10.000000.png"
"Origin001-10.000000.png"
"Origin002-10.000000.png"
"Overflow001-10.000000.png"
"Overflow002-10.000000.png"
"Overflow003-10.000000.png"

Many of those seem related to regions with tts:showBackground="always" being pruned.

nigelmegitt commented 3 years ago

Thanks for the pointers @palemieux , I believe 9e87039 fixes all the failing tests. As well as unintentional pruning of regions with tts:showBackground="always", the code was also pruning regions that were only referenced by the body directly.