silnrsi / palaso-python

Payap Linguistic Institute Computing Unit python packages
MIT License
3 stars 4 forks source link

hb-shape finds differences that cmptxtrender misses #8

Open devosb opened 2 years ago

devosb commented 2 years ago

In making changes to the OpenType code for Badami I noticed that cmptxtrender, running from smith, does not find differences sometimes, yet hb-shape and Firefox (reading FTML test data which has now been turned into a text.txt file for this GH issue) shows that there are differences.

Running cmptxtrender outside of smith exercises the bug. The command line I used was

cmptxtrender -p -k -e ot -e ot -s "knd2" -l "" -t "text.txt" -L test -L standard -o "text.html" --copy fonts_ot --strip "test.ttf" "reference.ttf"

The fonts named above were produced by building the project as it currently is (that is, with lookup RightSubMove commented out) and renaming Badami-Regular.ttf to reference.ttf.

Uncommenting out the mentioned lookup line produced differences that cmptxtrender/smith found. Re-comenting this line , and commenting out the next two lines (lookup blwmBelow...) produced a font (test.ttf) showed differences that hb-shape and Firefox reported on, but not cmptxtrender.

devosb commented 1 year ago

The issue was found on Ubuntu Focal (20.04) with Python 3.8. Now that I have switched Ubuntu Jammy (22.04) with Python 3.10, I cannot reproduce this issue.

bobh0303 commented 1 year ago

Re-opening this because it is still a problem and is making our regression testing untrustworthy. Specifically, we cannot trust that all "0-length files in results/tests/test" is an indicator of no regression.

In Lateef, for example, I commented out the entire medi feature (in gsub.feax) so no medial form glyphs are rendered. After running smith test the results/tests/test directory is full of zero-length files but, for example, the ALsorted ftml file shows differences between the built and reference files: image

hb-shape shows the regression also:

builder@smith-focal:/smith/font-lateef/results$ hb-shape --font-file Lateef-Regular.ttf -u 0639,0639,0639
[uni0639.fina=2+763|uni0639=1+782|uni0639.init=0+763]
builder@smith-focal:/smith/font-lateef/results$ hb-shape --font-file ../references/Lateef-Regular.ttf -u 0639,0639,0639
[uni0639.fina=2+763|uni0639.medi=1+709|uni0639.init=0+763]

(note the presence of uni0639.medi in the reference font vs uni0639 in the built font.

devosb commented 1 year ago

On my Ubuntu Jammy system smith test found differences after commenting out the feature described above. Below is the output of the first difference in the file ALsorted ftml file. medi Interestingly, I only see one difference with U+0639, while the comment above shows two differences. The second difference in this file was for a different codepoint.

bobh0303 commented 1 year ago

U+0639 2.3 should also have been a difference. If not, that is either a different bug or the same bug manifesting in some cases in Jammy.

devosb commented 1 year ago

Using a modified version of cmptxtrender that reports on all the test data, not just showing results that are different, for the second line of test data (which is the line with U+0639) line2

bobh0303 commented 1 year ago

Using a modified version of cmptxtrender that reports on all the test data, not just showing results that are different, for the second line of test data (which is the line with U+0639)

This may be a different bug, but things that are incorrect in the standard (right-hand) columns:

In the test column, 2.4 has the same problem as in the standard column but, because in the test font the medi feature is disabled, for 2.3 we have:

devosb commented 1 year ago

The Arabic test data comes from the first test in ALsorted-auto.ftml. A text file with this first test as actual characters is in debug.txt

When running hb-shape on My Ubuntu Jammy system (which has HarfBuzz 2.7.4, I don't think the output is changed in later versions of HarfBuzz) the output from this text file is

[uni0639=0+782]
[space=0+0|uni0639.init=0+763]
[space=1+0|uni0639.medi=1+709|space=0+0]
[uni0639.fina=1+763|space=0+0]
[uni0639.fina=2+763|uni0639.medi=1+709|uni0639.init=0+763]

Note that there are 5 lines of input and 5 lines of output, they correspond to the 5 rows in the cmptxtrender output above. The command line used was

hb-shape --font-file ~/.fonts/Lateef-Regular.ttf --text-file debug.txt

Note how the list of shaped glyphs is different between hb-shape and cmptxtrender. The shaping needs to be correct before any issues of comparing the output is addressed. If you need to see all the output from cmptxtrender (even if the results are the same) you can add, like I did, logme = True just above the conditional that usually only outputs differences.