About `ex` and `em` units

RuixiZhang42 commented 2 years ago

You mentioned here that

Regarding ex and em this is more challenging and costly as

it would need to hook into font selection

anyhow there is a problem of principle if those units end up to be less than 1pt.

However, I don’t think these are actual constraints. The observations above are based on explicit conversion ratios from the TeX/pdfTeX sources. But ex and em are simply handled the same way as internal dimensions, which means:

There is no need to hard-code any inverse of conversion ratio (well, it would be impossible anyway, because ex and em are font-dependent).
The barrier of “at least 1pt” is lifted.

\documentclass{article}
\begin{document}
Recall that the `\verb|1.3|' in `\verb|1.3\dimen0|' is internally represented as
$n+f/2^{16}$, where $n=1$ is the integer part and
$f=\hbox{round}(0.3\times2^{16})=\lfloor0.3\times2^{16}+1/2\rfloor=19661$.
So the input \verb|1.3| `equals' $(2^{16}+19661)/2^{16}=85197/2^{16}$.
\[
\dimen0=606021sp % 13Q or 3.25mm in Japanese typography
\verb|\dimen0=606021sp|
\Rightarrow
\verb|1.3\dimen0|
=\number\dimexpr1.3\dimen0\relax\,\hbox{sp}
=\Bigl\lfloor606021\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor,
\]
whereas $1.3\times606021=787827.3$ (rounds to 787827\,sp, which is 2\,sp short).

Now, let us try \verb|\font\1=cmr10 at 606021sp| and inspect
\verb|1em| and \verb|1.3em|:
\font\1=cmr10 at 606021sp
\[
\hbox{\1\verb|1em| is \number\dimexpr1em\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\1\verb|1.3em| is \number\dimexpr1.3em\relax\,sp}
=\Bigl\lfloor606022\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]

How about \verb|1ex| and \verb|1.3ex| for \verb|\font\2=cmr10 at 1212042sp|?
\font\2=cmr10 at 1212042sp
\[
\hbox{\2\verb|1ex| is \number\dimexpr1ex\relax\,sp (serves as internal dimension)}
\]
and
\[
\hbox{\2\verb|1.3ex| is \number\dimexpr1.3ex\relax\,sp}
=\Bigl\lfloor521851\,\hbox{sp}\times\frac{85197}{2^{16}}\Bigr\rfloor.
\]
\end{document}

RuixiZhang42 commented 2 years ago

Scanning em or ex unit and calculating internal scaled integer is described in Section 455 of tex.web:

@ @<Scan for \(u)units that are internal dimensions...@>=
save_cur_val:=cur_val;
@<Get the next non-blank non-call...@>;
if (cur_cmd<min_internal)or(cur_cmd>max_internal) then back_input
else  begin if mu then
    begin scan_something_internal(mu_val,false); @<Coerce glue...@>;
    if cur_val_level<>mu_val then mu_error;
    end
  else scan_something_internal(dimen_val,false);
  v:=cur_val; goto found;
  end;
if mu then goto not_found;
if scan_keyword("em") then v:=(@<The em width for |cur_font|@>)
@.em@>
else if scan_keyword("ex") then v:=(@<The x-height for |cur_font|@>)
@.ex@>
else goto not_found;
@<Scan an optional space@>;
found:cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000));
goto attach_sign;
not_found:

The line cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000)) takes <n+f/65536><internal dimen> and turns it into n<internal dimen>+⌊<internal dimen>*f/65536⌋, which is the same as ⌊<internal dimen>*(n+f/65536)⌋.

It does not matter how TeX gets the em or ex internal dimension (the v variable in the above code). It suffices to use \dimexpr1em\relax and \dimexpr1ex\relax, and see how the input compares with these last two dimensions.

RuixiZhang42 commented 2 years ago

Seems easy enough to program:

\documentclass{article}
\makeatletter
% Using \numexpr\dimexpr...\relax*<num>/<den>\relax sp is overly convoluted
% We are already inside a \dimexpr...\relax (the most outer layer),
% so why not just say (...)*<num>/<den>?
\newcommand\converttoem[1]
  {\strip@pt\dimexpr(#1)*65536/\dimexpr1em\relax\relax em}
\newcommand\converttoex[1]
  {\strip@pt\dimexpr(#1)*65536/\dimexpr1ex\relax\relax ex}
\makeatother
\begin{document}

1.30001em (\number\dimexpr 1.30001emsp)

\verb|\converttoem{1.30001em}| gives \converttoem{1.30001em}
(\number\dimexpr\converttoem{1.30001em}sp)

1.30001ex (\number\dimexpr 1.30001exsp)

\verb|\converttoex{1.30001ex}| gives \converttoex{1.30001ex}
(\number\dimexpr\converttoex{1.30001ex}sp)

\huge

1.30001em (\number\dimexpr 1.30001emsp)

\verb|\converttoem{1.30001em}| gives \converttoem{1.30001em}
(\number\dimexpr\converttoem{1.30001em}sp)

1.30001ex (\number\dimexpr 1.30001exsp)

\verb|\converttoex{1.30001ex}| gives \converttoex{1.30001ex}
(\number\dimexpr\converttoex{1.30001ex}sp)

\tiny

1.30001em (\number\dimexpr 1.30001emsp)

\verb|\converttoem{1.30001em}| gives \converttoem{1.30001em}
(\number\dimexpr\converttoem{1.30001em}sp)

1.30001ex (\number\dimexpr 1.30001exsp)

\verb|\converttoex{1.30001ex}| gives \converttoex{1.30001ex}
(\number\dimexpr\converttoex{1.30001ex}sp)

\end{document}

The decimal number 1.30001 is deliberately chosen (since the functions \converttoem and \converttoex should convert it to 1.3). But, as you can see in the example above, the internal sp units agree.

RuixiZhang42 commented 2 years ago

Wait… this is not as simple as I thought… \strip@pt\dimexpr(#1)*65536/\dimexpr<internal dimen>\relax\relax <internal dimen> does not always work.

Counterexample: Say 1em=3sp, and user inputs 1.5em. Then 1.5em is turned into internal floor(1.5*3)sp=4sp. Then \dimexpr4sp*65536/\dimexpr1em\relax\relax produces a scaled integer of round(4*65536/3)=87381 (or 1.33333pt). But then 1.33333em will be floor(87381/65536*3)sp=floor(3.9999847412109375)sp=3sp, which is not the user input 4sp.

jfbu commented 2 years ago

@RuixiZhang42 for some reason, I did not receive notification that you opened this issue, but by luck you made a comment on the latex3 issue with bp unit and so I become aware of this now. (I need to check the repo configuration). Thanks for your thoughts, currently I don't have much available time for TeX matters, so please excuse if response takes time on my part.

jfbu commented 2 years ago

@RuixiZhang42 I have no idea why I was not notified about this issue creation, my settings look ok and my mail client did not put it into spam. Please ping me next time to be sure... I could have not become aware of this for months were it not for the interaction on the latex3 repo...

jfbu commented 2 years ago

@RuixiZhang42

Regarding ex and em this is more challenging and costly as

it would need to hook into font selection

anyhow there is a problem of principle if those units end up to be less than 1pt.

However, I don’t think these are actual constraints. The observations above are based on explicit conversion ratios from the TeX/pdfTeX sources. But ex and em are simply handled the same way as internal dimensions, which means:
1. There is no need to hard-code any inverse of conversion ratio (well, it would be impossible anyway, because `ex` and `em` are font-dependent).

My quote about hooking into font selection might have been motivated by considerations I have forgotten now. I agree one can naturally always dynamically recover 1ex and 1em either from a \dimexpr or from the suitable \fontdimen.

2. The barrier of “at least 1pt” is lifted.

I agree that tex.web does as your say: an internal dimension such as em used as unit is matched to a factor with denominator 65536. As such if this factor is >1 it can be handled exactly as done in the texdimens code for the case of the bp unit. We could also test if factor is >2 which admits slightly simpler approach.

But if the factor is <1 then "questions of principle" arise as I said in my quote, but here again I do not remember exactly what I was referring too, I think I was meaning simply the map is many-to-one so we can not guarantee a closed loop, but I had in mind not the decimal user input but the already converted to integer N one. [a few paragraphs below I seem to have remembered my "questions of principle"]

Anyway, taking the notation from top of the comments in texdimens.tex, we have some U equals to the truncation of Nphi (assume U positive), where phi is some factor which here I will take <1: so it is a fact that all U's are possible and there is always at least one N. How to find it? The math is simply that U <= N phi < U + 1 or equivalently U psi <= N < (U+1)psi where psi=1/phi>1 (for our case which is opposite to the one handled so far by texdimens).

Generally speaking the least integer N at least equal to some x is ceil(x). And the maximal integer strictly less than some y is ceil(y)-1 (currently I have forgotten what the anglo-saxon convention for ceil() function on negative domain so I assume here we are handling positive numbers) and these are valid assumptions also if x or y are integers. So the condition on N is that it should be at least ceil(U * psi) and at most ceil((U+1) * psi) -1.

No wonder then that round(U * psi) will not always work: if it rounds strictly down, we are doomed.

What about the M = round((U+0.5)*psi)̀ approach, will it work? (psi = 1/phi > 1).

Yes it will:

Let t = U * psi, v=(U + 0.5)* psi and w = (U + 1) * psi. The M is at most at 0.5 distance of v but this v is at a distance 0.5 * psi > 0.5 from both t and w, so in fact t < M < w and our integer M satisfies the constraints. But it is not necessarily the same as the N we started with. Although we probably don't care much about this.

Ah now I probably remember one of my "questions of principle".

At this stage if say 1em is f sp with f < 65536 we are trying to express U sp as (N/65536) em, proposing to use N = round ((2*U +1) * 32768 / f). Of course if f is smaller than 32768 we risk arithmetic overflow in numexpr. And in fact even if 32768 <= f < 65536 because from the N we still want a decimal D such that N = round(D * 65536), which we naively might want to do by \the\dimexpr N sp to recover D in front of pt. So we need to not trigger dimension overflow for this i.e. we must have N <2**30, although we could have as input all U's such that 2U + 1 < 2**31. But the ratio 32768/f is >1/2 if f<65536.

Say for example f=1024 (so 1em = 1024sp), then we simply want to do the computation (2U+1)*32 then find a decimal D such that (2U+1)*32 = round(65536 D). To avoid overflow in the multiplication by 32, we first need to split 2U+1 say as an integer multiple of 65536/32=2048 and a remainder, obtaining2U+1=k*2048 + R then we will obtain the decimal D as k plus a fractional decimal E such that round(65536 E)=R * 32 which we obtain as E pt = \the\dimexpr 32 * R sp (and ̀32 * R sp < 1pt).

For general f, we need to first do Euclidean division giving 2U + 1 = k 2f + R then the number N = round ((2*U +1) * 32768 / f̀) = 65536 * k + round (R * 32768 / f) and we represent it by the pair of k and the feasible without overflow B = round (R * 32768 / f). Indeed R * 32768 / f is at most 65536 - 32768/f. As f<65536 by hypothesis, then 32768/f > 0.5 and we are certain that B<65536. Then to obtain the decimal D it is the integer k plus the decimal E obtained from \the\dimexpr B sp which will give something < 1pt.

Doing the Euclidean division without arithmetic overflow expandably needs extra care because numexpr gives use rather the rounded quotient <(2U+1)/2f> and we prefer truncation else we risk overflow when getting the remainder, we don't want to have to adjust a negative remainder. So what we need to do is the rounded division <(2U + 1 - f)/(2f)> via numexpr. This does the trick.

Algorithm:

get f from 1em = f sp and apply the following iff f<65536 (else algorithm is like in texdimens for bp)
given a dimension U sp compute via numexpr the integer k equal to the numexpr expression (2*U+1 - f)/(2*f) i.e. this gives the truncation to an integer of (2U+1)/2f. Notice that if ̀U=0 this gives zero and not some annoying -1 as the tie to -0.5 of the negative fraction is impossible.
Apart from that we also need to compute via numexpr B = (R * 32768)/f where R has been evaluated as 2U + 1 - k * 2 * f. We are certain that 0 <= B < 65536.
We compute \the\dimexpr B sp which gives a fractional decimal E<1 and contatenate the integer k with the fractional part E.
and finally we add the code for negative inputs.

~~As I have been writing this following my meandering thoughts on the moment, I may well have embarked into in an overly complicated journey. I will need to give it some careful second reading.~~

All clear. This is what must be done for handling "units" which are <1pt.

jfbu commented 2 years ago

In the math of my previous comment there is peculiar aspect that the proof of B<65536 shows that in fact the smaller the f the better the upper bound for B hence for the actual fractional digits we will end up with in the end. For example if ̀f=4 we get B <= 65536 - 32768/4 = 57344 and 57344sp is 0.875pt. In other terms we can get all dimensions from decimal input k.ddddd em if ̀ 1em=4sp using only ddddd<=87500. Thinking about it we see that it leaves a gap less than 0.125 * 4 sp i.e. 0.5 sp below multiples of four, so by rounding we do get indeed all possible dimensions modulo four, be it 0, 1, 2 or 3. We don't need the 87500<=ddddd<=99999 input range (recall that \the\dimexpr outputs at most 5 decimal fractional digits), because k.ddddd would then give (after multiplication by 4 and rounding) multiples of 4 sp which are already obtained anyhow without need of fractional digits in the input.

jfbu commented 2 years ago

closed in 68a65028151726d47da5b56535c13ed0eed291e7 more precisely 7a3c79d94ea39079febf71af11538c8e82f7a63f

I had done a dedicated branch and was planning to open a pull-request here, but forgot I was visiting the web interface without having logged in so never could create the PR and ended up merging the branch in master locally and pushing it here...

jfbu / texdimens

About `ex` and `em` units #2