Closed skyline75489 closed 1 year ago
This should also help users who use non-English locales, for example avoid analyze entirely:
ηζζζ (C) Microsoft CorporationγδΏηζζζε©γ
/cc @miniksa for both sanity & technical check
I've done some experiment and I found that the text complexity is not the same as run splitting. For example with the following text:
ηζζζ (C) Microsoft CorporationγδΏηζζζε©γ
The text complexity analysis reports (a, b is pos, length pair) :
The run analysis split it into the following runs:
We might also need some sort of RLE implementation to find it a run is entire simple and then optimize the shaping process for the run.
I agree that we should make use of the additional analysis information to improve performance in this way.
I do think that we could just further split the Run
s and give them an additional simple-or-not parameter (bool
) during the initial _AnalyzeTextComplexity
that is just picked up during _AnalyzeRuns
to determine the full analysis or skip and again during _ShapeGlyphRuns
to determine the quick-mapping or slow-mapping to glyphs. In lieu of the whole thing being simple, a Run
would be simple or not.
I'm not quite sure why your example maps as it does. Are some of those characters UTF-16 surrogate pairs?
those are just normal Chinese characters. Originally I thought text complexity analysis would split the text the same way as run splitting. Just want to add an example to show that itβs not.
a Run would be simple or no
This is likely undetermined. In the example above:
βηζζζ (β
This is a Run. But according to text complexity, the first 4 characters are complex, the last 2 characters are simple. This is what frustrates me. We canβt just simply know a Run is simple or not easily and optimize based on that.
Yeah but what I'm saying is that we can just call _SetCurrentRun
and _SplitCurrentRun
inside of _AnalyzeTextComplexity
when we start listening to the length of the complexity and add the additional data.
So then you have a [0,4) complex run. [6,8) simple run. [8, 26) simple run. etc. etc.
Doesnβt that bring more fragmentation into the process? Will it affect the line breaking and script analysis result? I need to dig more into this...
θ·ε Outlook for iOShttps://aka.ms/o0ukef
To your questions: oh probably. It's worth a try though to see if it just works. Sometimes the simple answer is "good enough". If it turns out to not be, we can refine further from there. Feel free to try/dig!
Can we reopen this? #9202 was reverted.
AtlasEngine does this! π
Description of the new feature/enhancement
Inspired by https://github.com/microsoft/cascadia-code/issues/411, certain ASCII characters sometimes break the simplicity of the entire text, depending on the font being used. The current implementation skips dwrite analysis when the entire text is simple:
With for example
Fira Code
, in most cases the optimization only applies to lines with 120 spaces, which is not good.Proposed technical implementation details (optional)
GetTextComplexity
can provide a breakdown report of the text, showing which specific range of the text is simple, we should be able to utilize it like this:See #6695 for the introduction of text complexity analysis.