Incorrect optical size in generated database

zhouyan commented 7 years ago

I recently noticed that the generated database, by default, record the optical size in sp. However, it seems that the sizes are calculated incorrectly. (Or maybe I misunderstood the behavior of luaotfload)

(I don't have any configuration file, so everything is in its default setting)

Consider the entry for Latin Modern Roman, in particular the line,

{ 652911.58156912, 718202.73972603, 620266.00249066, 843 },

within the r table (regular). This line corresponding to LMRoman10-Regular, The reported optical size through otfinfo is

design size 10 pt, size range (9.5 pt, 11 pt], subfamily ID 1, subfamily name Regular

If I understand correctly, these values are PostScript points, that is, the sp value shall be,

11pt => 11 / 72 * 72.27 * 65536 => 723599.36

While in the table, the corresponding value 718202.73972603 is actually,

11 / 72.27 * 72 * 65536

In particular, this cause problems when selecting font when the size is specified according to bp instead of pt. For example,

\documentclass{article}

\def\FontName{
  \directlua{
    local i = font.current()
    local f = font.getfont(i)
    local n = f.fullname
    tex.sprint(n)
  }
}

\usepackage{fontspec}
\setmainfont{Latin Modern Roman}

\begin{document}

\fontsize{10.99995}{11}\selectfont % slightly smaller than 11pt
\FontName
\fontsize{11.04124}{11}\selectfont % slightly smaller than 11bp
\FontName

\end{document}

The output is screen shot 2017-02-01 at 23 52 49

(I recall that luaotfload used to select font by regarding the optical size as an Closed-Open interval, while OpenType use an Open-Closed interval, and thus I used font size very slightly smaller than the upper bound of the intended use)

In the above example, for 11bp, the 10 point fonts shall be used, but the 12 point one is used instead.

After patching the database by using the values 723599.36 etc, the expected output is obtained

screen shot 2017-02-01 at 23 53 36

eroux commented 7 years ago

I think this is a duplicate of https://github.com/lualatex/luaotfload/issues/389

zhouyan commented 7 years ago

I think it is an issue introduced by that and related patch.

The designsize as returned by the LuaTeX font loader is merely Size in point x 65536, regardless wether this Size in point is PS or US point.

For example, LMRoman10-Regular has a design size 655360 and Arno Pro Regular (12 point) has 786432. So if this Size in point means PS point, as in most OpenType fonts, instead of US point, then the correct sp value shall be v / 72 * 72.27 instead of v / 72.27 * 72, which does not make any sense to me. The same goes for the dd option, (Line 1050--1054 in luaotfload-database.lua, in the GitHub version)

In the particular case of LMRoman, I believe the pt option is probably more appropriate. Knuth most likely designed the original CM with pt instead of bp in mind. But the default bp option, which is the default, is more appropriate for most other fonts.

zhouyan commented 7 years ago

If I unsderstand correctly, The current formula in the source is for converting value in pt to value in bp and dd, while what was really needed was the other way around.

eroux commented 7 years ago

Oh ok, sorry... maybe you could build a pull request?

phi-gamma commented 7 years ago

2017-02-01 (Wednesday), notifications@github.com (Yan Zhou):

If I unsderstand correctly, The current formula in the source is for converting value in pt to value in bp and dd, while what was really needed was the other way around.

You’re right, I messed up the conversions with the refactoring :/ Looks like we’ll be having a bugfix release soon.

Thanks for double-checking! Philipp

phi-gamma commented 7 years ago

2017-02-01 (Wednesday), notifications@github.com (Yan Zhou):

(I recall that luaotfload used to select font by regarding the optical size as an Closed-Open interval, while OpenType use an Open-Closed interval, and thus I used font size very slightly smaller than the upper bound of the intended use)

This would be a bug in any case. The present issue aside, do you think the behavior you were observing was a consequence of treating the design sizes in the font as if they were Knuth points instead of DTP points?

zhouyan commented 7 years ago

I think the Open-Close interval issue was fixed a long time ago. I was just been cautious.

I think treating the design size either as US point or PS point is OK. It will only make a difference in marginal case. The difference at 20pt is less than 0.1 point. however, if we need to make a difference, and converted it the wrong way, basically we squared the ratio and the difference become slighted bigger. However, that difference is still very small (would be a lot worse for didot point)

It is only an issue when the size in question happened to be on the boundary, in which case any margin or error will cause a change of font. However, even in that case I still don't think it is a big problem. For example, say a design size is 18 point and upper bound is 24 point, and the next range is design size 30 point (lower bound 24 point). It is purely academia to argue around 24 point which shall be better.

The only real effect is for fonts such as LM, Arno, which has a few very narrow intervals (top -bottom < 2pt). In this case, it does make a difference when switch between a 10 point font and a 12 point font.

Personally, at text size, I usually use font at its design size, which avoids all these problems. For example use Arno at either 10point or 12 point. And if I want 11point, I switch to Minion.

zhouyan commented 7 years ago

I am afraid my last pull request did not really fix the problem. I think I misunderstood the source.

Before I make an even bigger mess (very sorry about that), please correct me if I understood the following correctly,

The design_size_dimension function (on Line 1045) is only used when loading a font.
The get_size_info function (on Line 1356) is what used for generating the database.
The configuration for choose among pt, dd, bp only affects fonts loading instead of database generating.

If this understanding is correct, then original design_size_dimension was correct. That is, we do want to convert from TeX point to DTP point and then multiple the value by 2^16. That is the value shall be shrieked by a ratio 7200 / 7227. My fix in the pull request was incorrect.

Instead, the real problem is that in get_size_info, there is also a conversion from pt to bp. Which is redundant. That is, we end up with a double conversion. After removing the following

            design_size         = (design_size         * 7227) / 7200
            design_range_top    = (design_range_top    * 7227) / 7200
            design_range_bottom = (design_range_bottom * 7227) / 7200

inside get_size_info, and restore design_size_dimension to its state before the pull request. Everything seems to work correct.

The following is an example with more rigorous test of the marginal case,

\documentclass{article}
\usepackage{luaotfload}

\font\lme = {name:Latin Modern Roman} at 723599.36sp % exact 11bp
\font\lmh = {name:Latin Modern Roman} at 723599.86sp % exact 0.5sp bigger
\font\lmm = {name:Latin Modern Roman} at 723599sp    % less than 1sp smaller
\font\lmp = {name:Latin Modern Roman} at 723600sp    % less than 1sp larger

\def\FontName{\directlua{tex.sprint(font.getfont(font.current()).fullname)}}

\begin{document}

\noindent\lme\FontName\par
\noindent\lmh\FontName\par
\noindent\lmm\FontName\par
\noindent\lmp\FontName\par

\end{document}

screen shot 2017-02-02 at 15 22 54

And I get the expected output, that is the first three choose LM10 and the last choose LM12. (Note that, the second one also choose LM10, while the asked size is bigger than 11bp is that, sp is the smallest TeX unit, and it is truncated [instead of rounded] to integer.)

Another test with Arno pro, which has a boundary at 10.9, which in bp is an exact integer in sp.

\documentclass{article}
\usepackage{luaotfload}

% bounds (10, 10.9]
\font\lme = {name:Arno Pro} at 717021sp % exact 10.9bp
\font\lmm = {name:Arno Pro} at 717020sp % exact 1sp smaller
\font\lmp = {name:Arno Pro} at 717022sp % exact 1sp bigger

\def\FontName{\directlua{tex.sprint(font.getfont(font.current()).fullname)}}

\begin{document}

\noindent\lme\FontName\par
\noindent\lmm\FontName\par
\noindent\lmp\FontName\par

\end{document}

Also give correct output. screen shot 2017-02-02 at 15 32 46

If this time my understanding is correct. I will create a new pull request to fix my mess (sorry again)

eroux commented 7 years ago

Not directly related: did you test the Open-Closed vs Closed-Open interval?

zhouyan commented 7 years ago

Yes, the Arno example has one exactly at the upper bound and it behaves correctly. That is, a Open-Closed interval.

However, it is still a rounded number. It is difficult to find a case where the bp value will be exactly at the boundary and exactly an integer in sp.

eroux commented 7 years ago

Well, as long as the conversion is the same in both cases:

when luaotfload reading the font, x bp is y sp
when the user requires font at x bp, luaotfload looks for th variant as y sp (and not y-1 or y+1)

then everything's fine! Thank you for inspecting

zhouyan commented 7 years ago

Regarding the Open-Closed interval, I artificially patched the generated database such that Arno-Subhead shall be (14, 100] and Arno-Display be (100, 1000]. 100bp is exactly 6578176sp, and loading at that size the Subhead version is correctly chosen. So there's no issue at all there.

I will do some more testing and create new pull request after some more testing.

zhouyan commented 7 years ago

I created three separated PR. Each are related but I feel yet independent enough to allow merging only some of them

Below are test results when #400 and #401 are applied, For all three configurations

test-bp.pdf test-dd.pdf test-pt.pdf

The source that created these results are,

#!/usr/bin/env perl

for (qw(pt bp dd)) {
    open my $conf, ">", "luaotfload.conf";
    print $conf "[db]\n";
    print $conf "    designsize-dimen = $_\n";

    open my $tex, ">", "test-$_.tex";
    print $tex "\\input{test}\n";
    print $tex "\\begin{document}\n";
    print $tex "\\testsize{$_}\n";
    print $tex "\\end{document}\n";
    system "lualatex test-$_.tex";
}

\documentclass{article}

\usepackage{luaotfload}
\usepackage{graphics}
\usepackage[scale=0.85,a5paper,landscape]{geometry}

\pagestyle{empty}
\parindent=0pt

\def\testsize#1{%
  \leavevmode%
  \hbox to 1.2in{Asked size\hfil\strut}%
  \hbox to 1.2in{Actual size\hfil\strut}%
  \hbox to 1.2in{Check\hfil\strut}%
  \hbox to 1.2in{Optical range\hfil\strut}%
  Font name
  \vskip\baselineskip
  \testpoint{#1}{5}
  \testpoint{#1}{5.5}
  \testpoint{#1}{6}
  \testpoint{#1}{6.5}
  \testpoint{#1}{7}
  \testpoint{#1}{7.5}
  \testpoint{#1}{8}
  \testpoint{#1}{8.5}
  \testpoint{#1}{9}
  \testpoint{#1}{9.5}
  \testpoint{#1}{10}
  \testpoint{#1}{11}
  \testpoint{#1}{12}
  \testpoint{#1}{14}
  \testpoint{#1}{17.2}
  \testpoint{#1}{24}
}

\def\testpoint#1#2{%
  \begingroup%

  % Load the font at size #2 in unit #1
  \font\testfont = {name:Latin Modern Roman} at #2#1%
  \testfont%

  % Get the actual size in pt
  \edef\sizept{%
    \directlua{
      local s = font.getfont(font.current()).size / 65536
      s = math.round(s * 1000) / 1000
      tex.sprint(s)
    }%
  }

  % Get the actual size in bp
  \edef\sizebp{%
    \directlua{
      local s = font.getfont(font.current()).size / 65536 * 7200 / 7227
      s = math.round(s * 1000) / 1000
      tex.sprint(s)
    }%
  }

  % Get the actual size in bp
  \edef\sizedd{%
    \directlua{
      local s = font.getfont(font.current()).size / 65536 * 1157 / 1238
      s = math.round(s * 1000) / 1000
      tex.sprint(s)
    }%
  }

  % Get the lower bound of the optical range
  \edef\minsize{%
    \directlua{
      local s = font.getfont(font.current()).shared.rawdata.metadata.minsize
      tex.sprint(s / 10)
    }%
  }

  % Get the upper bound of the optical range
  \edef\maxsize{%
    \directlua{
      local s = font.getfont(font.current()).shared.rawdata.metadata.maxsize
      tex.sprint(s / 10)
    }%
  }

  \edef\inrangept{%
    \directlua{
      local s = font.getfont(font.current()).size
      local l = font.getfont(font.current()).shared.rawdata.metadata.minsize
      local u = font.getfont(font.current()).shared.rawdata.metadata.maxsize
      l = l * 65536 / 10
      u = u * 65536 / 10
      if s > l and s <= u then
        tex.sprint("Pass")
      else
        tex.sprint("Fail")
      end
    }%
  }

  \edef\inrangebp{%
    \directlua{
      local s = font.getfont(font.current()).size * 7200 / 7227
      local l = font.getfont(font.current()).shared.rawdata.metadata.minsize
      local u = font.getfont(font.current()).shared.rawdata.metadata.maxsize
      l = l * 65536 / 10
      u = u * 65536 / 10
      if s > l and s <= u then
        tex.sprint("Pass")
      else
        tex.sprint("Fail")
      end
    }%
  }

  \edef\inrangedd{%
    \directlua{
      local s = font.getfont(font.current()).size * 1157 / 1238
      local l = font.getfont(font.current()).shared.rawdata.metadata.minsize
      local u = font.getfont(font.current()).shared.rawdata.metadata.maxsize
      l = l * 65536 / 10
      u = u * 65536 / 10
      if s > l and s <= u then
        tex.sprint("Pass")
      else
        tex.sprint("Fail")
      end
    }%
  }

  % Show the asked size, actual size, the optical range, and the font name
  \leavevmode%
  \begingroup%
    \normalfont\ttfamily
    \hbox to 1.2in{#2{}#1\hfil\strut}%
    \hbox to 1.2in{\csname size#1\endcsname{}#1\hfil\strut}%
    \hbox to 1.2in{\csname inrange#1\endcsname\hfil\strut}%
    \hbox to 1.2in{(\minsize, \maxsize]\hfil\strut}%
  \endgroup%
  \resizebox{!}{7pt}{%
    \directlua{tex.sprint(font.getfont(font.current()).fullname)}%
  }\par%

  \endgroup%
}

phi-gamma commented 7 years ago

2017-02-01 (Wednesday), notifications@github.com (Yan Zhou):

The design_size_dimension function (on Line 1045) is only used when loading a font.

The get_size_info function (on Line 1356) is what used for generating the database.

The configuration for choose among pt, dd, bp only affects fonts loading instead of database generating.

That is accurate. In the first revision I stored the values retrieved from the size table as-is. Since font definition deals in TeX scaled points this would mean one extra conversion sp→decipoints per request.

The current design, at least before I messed it up, eliminates that step by storing the values in sp in the index. Then unless the user configures a different conversion, we can use the input directly (just one no-op funcall overhead). When a pt or dd conversion is chosen, the value needs scaling.

If this understanding is correct, then original design_size_dimension was correct. That is, we do want to convert from TeX point to DTP point and then multiple the value by 2^16. That is the value shall be shrieked by a ratio 7200 /

My fix in the pull request was incorrect.

It’s wrong differently but that’s my fault ;) May I ask you to review this fix? https://github.com/phi-gamma/luaotfload/commit/529578f58c8221f7fe3deaceecec3a4cf3ff8f2d

The following is an example with more rigorous test of the marginal case,

Thanks for coming up with a test. I will have a closer look at it and maybe add it to the semi-official test repo: https://bitbucket.org/phg/lua-la-tex-tests

zhouyan commented 7 years ago

May I ask you to review this fix? phi-gamma@529578f

I think it is correct for bp and pt, but incorrect for dd. I found that one's head can easily get confused when thinking of these ratios only in head. So I wrote down the logic below, please review, (also see #400)

test.pdf

phi-gamma commented 7 years ago

2017-02-02 (Thursday), notifications@github.com (Yan Zhou):

May I ask you to review this fix? phi-gamma@529578f

I think it is correct for bp and pt, but incorrect for dd. I found that one's head can easily get confused when thinking of these ratios only in head. So I wrote down the logic below, please review, (also see #400)

test.pdf

Thanks for taking the time to write it up. I took the liberty of adding that in full as a comment: https://github.com/phi-gamma/luaotfload/commit/6971d4a0340aad6736ef6bd7bd16361d156142dd

zhouyan commented 7 years ago

No problem at all.

I am working on a slightly improved test that take into account of the rounding error. The default behavior, bp, though the most sensible one, will introduce a rounding errors either in the database, or the asked size when loading font

I think yours is better then #400 in the sense that, #400 will introduce rounding for both, even though both will be within 1ULP. In your fix, there's no rounding error whatsoever for asked size, while the values in database will have a potentially larger error due to that they are written in textual form. However, the lack of error for the asked size, I believe shall make things more predictable and easier to reason with.

I will update a new test once your fix and #401 are merged.

zhouyan commented 7 years ago

I worked out a new test in plain TeX (depends on nothing but luaotfload itself) and the perl script (attached).

testsize.zip

It includes test settings for most optical fonts out there, excluding those with optical sizing in design but not in OpenType features, such as some hoefler & co fonts.

Free fonts: Latin Modern family, EB Garamond (need a separate download, the one included in TeXLive only have 12 points design)

Commercial fonts: Adobe optical fonts and Minion Math

Full list

Adobe Jenson Pro
Arno Pro
Brioso Pro
Chaparral Pro
Cronos Pro
EB Garamond
Garamond Premier Pro
Kepler Std
Latin Modern Mono
Latin Modern Roman
Latin Modern Sans
Minion Math
Minion Pro
Sanvito Pro
Utopia Std
Warnock Pro

Note that without manual patching of the database (see #363, an issue that I don't believe can ever be fixed entirely given the mess of font foundries' naming schemes and ID fields, etc. And frankly, I don't think worth anyone's energy to try. I merely patch the database manually after luaotfload-tool -u), many tests will fail.

Without any intervention, the following faces will fail the tests,

Cronos Pro
EB Garamond
Latin Modern Mono
Minion Math
Sanvito Pro

Sanvito and EB Garamond fail mainly because bugs in font. There are gaps between ranges in their reported values.

Others fail the tests because of #363, which is unrelated (those pass the tests may select incorrect width and weights, also #363, again unrelated to issues here, and the founders are to blame).

In the new test, no conversion of dimension is done in Lua side. Instead, the returned design size, actual loaded size, etc., are taken literally and comparisons are done in TeX instead in Lua. The basic logic is that, if one ask for a size in TeX, and get a range selected correctly in the eye of TeX, then it should be considered to be OK, even if in real there might be some rounding errors within Lua that can make a difference in marginal case.

zhouyan commented 7 years ago

I just tried phi-gamma@7308f84b with the new test file, and everything works out well.

phi-gamma commented 7 years ago

2017-02-03 (Friday), notifications@github.com (Yan Zhou):

I worked out a new test script (attached).

testsize.zip

Thanks. I’m currently deep into two other issues but I’ll get back to the material.

Note that without manual patching of the database (see #363, an issue that I don't believe can ever be fixed entirely given the mess of font foundries' naming schemes and ID fields, etc.), many tests will fail.

I spent some time on that a while ago and it’s even worse. The same font would end up having different strings in the name table for different versions. For instance, I was writing an extensive test for the files of Arno Pro that I have along the likes of the Garamond test:

https://bitbucket.org/phg/lua-la-tex-tests/src/3ef71447d31a884b2b904a11563a1b18e017ddce/lua/tla-names-1-adobe-garamond.lua

Turns out the files, despite being the Adobe originals, are an older version that on account of a weird bug has distorted (non-standard) values in the size table. I believe Hans added a hack for that case in the meantime but hours of testing were simply for nothing …

phi-gamma commented 7 years ago

2017-02-03 (Friday), notifications@github.com (Yan Zhou):

I just tried phi-gamma@7308f84b with the new test file, and everything works out well.

Great news. Does that include the subject of rounding errors?

zhouyan commented 7 years ago

In a sense, yes. Say one load font at "x bp", and the numerical values of the ranges is (a, b]. Then we compare if the asked size fall in (a bp, b bp] using TeX's \ifdim instead of doing any calculation in Lua side. So the test will not introduce any error and would be the same as how people use the luaotfload or fontspec, or latex \fontsize. Thus there should be any false positive or negative test results, from a user perspective.

On Feb 4, 2017, at 04:37, Philipp Gesang notifications@github.com wrote:

2017-02-03 (Friday), notifications@github.com (Yan Zhou):

I just tried phi-gamma@7308f84b with the new test file, and everything works out well.

Great news. Does that include the subject of rounding errors?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhouyan commented 7 years ago

Exactly. After raised that issue myself, I spent considerable time trying to put together a PR. And it came to nothing. In the end, I simply alter the generated database manually (actually through a script). It give perfect results also allow me to use some non-optical fonts in a similar fashion. For example, semi-bold for small size.

The only downside is that if luaotfload changes the format of the database, things need to be reworked a bit. But let's just consider it an opportunity to testing the changes (that's how I started this issue in the first place)

On Feb 4, 2017, at 04:35, Philipp Gesang notifications@github.com wrote:

2017-02-03 (Friday), notifications@github.com (Yan Zhou):

I worked out a new test script (attached).

testsize.zip

Thanks. I’m currently deep into two other issues but I’ll get back to the material.

Note that without manual patching of the database (see #363, an issue that I don't believe can ever be fixed entirely given the mess of font foundries' naming schemes and ID fields, etc.), many tests will fail.

I spent some time on that a while ago and it’s even worse. The same font would end up having different strings in the name table for different versions. For instance, I was writing an extensive test for the files of Arno Pro that I have along the likes of the Garamond test:

https://bitbucket.org/phg/lua-la-tex-tests/src/3ef71447d31a884b2b904a11563a1b18e017ddce/lua/tla-names-1-adobe-garamond.lua

Turns out the files, despite being the Adobe originals, are an older version that on account of a weird bug has distorted (non-standard) values in the size table. I believe Hans added a hack for that case in the meantime but hours of testing were simply for nothing …

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

zhouyan commented 7 years ago

I wasn't quite able to express what I aimed in the test before. Basically, I think if user ask for at 10bp, then the font with, say (8, 10] shall be chosen instead of (10, 20]. It shouldn't matter if in TeX 10bp is really 10.00000000001bp or 9.9999999999bp.

zhouyan commented 7 years ago

Just a simple example of how I patch the database and an idea I am thinking about for properly test all edge cases of interest.

kpse.set_program_name 'luatex'

require 'lualibs'

names = load(gzip.load('luaotfload-names.lua.gz'), 't')()

families_sys = names.families.system.otf
files_sys = names.files.bare.system.otf

factor = 2^16 * 7227 / 7200

families_sys['ebgaramond'] = {
  r = {
    {
      8 * factor, 9.5 * factor, 0 * factor,
      files_sys['EBGaramond08-Regular']
    },
    {
      12 * factor, 9.5 * factor, 120 * factor,
      files_sys['EBGaramond12-Regular']
    },
  },
}

data = table.serialize(names, true)
gzip.save('luaotfload-names.lua.gz', data)
local f = io.open('luaotfload-names.luc', 'wb')
local s = load(data)
f:write(string.dump(s, true))
f:close()

By altering the database manually, we can artificially create situations of ranges that we want to test, without looking for a specific font that happened to have the edge case of interest.

It would be better if there's a way to dynamically place a hook such that it can be altered dynamically after the database is loaded, instead of having to go through the generate-patch-test cycle.

u-fischer commented 5 years ago

I'm closing the issue. @zhouyan if you think that there are open questions here, please open a new issue at https://github.com/u-fischer/luaotfload/issues.

lualatex / luaotfload

Incorrect optical size in generated database #398