robotools / extractor

Tools for extracting data from font binaries into UFO objects.
MIT License
52 stars 16 forks source link

extractor.exceptions.ExtractorError: There was an error reading the OTF file. #75

Open jansindl3r opened 1 month ago

jansindl3r commented 1 month ago

I get an error when extracting some fonts, like this one "JunigardenSerif_PERSONAL_USE_ONLY.otf"

https://www.dafont.com/de/junigarden-swash.font

I use this library in a plugin, when it's installed locally it works fine, but when my plugin is installed from pypi. It gives an error. I installed your library as editable and see that baseAnchor can sometimes be None.

Here is my plugin https://github.com/no-design-foundry/filters-rasterizer

    for baseRecordIndex, baseRecord in enumerate(subtable.BaseArray.BaseRecord if subtableIsType4 else subtable.Mark2Array.Mark2Record):
        baseAnchor = baseRecord.BaseAnchor[0] if subtableIsType4 else baseRecord.Mark2Anchor[0]
        anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})

Here is a proof when I print baseAnchor

<fontTools.ttLib.tables.otTables.Anchor object at 0x11052ae10>
<fontTools.ttLib.tables.otTables.Anchor object at 0x11052af90>
<fontTools.ttLib.tables.otTables.Anchor object at 0x11052ded0>
<fontTools.ttLib.tables.otTables.Anchor object at 0x11052e090>
None

then of course this error comes up

Traceback (most recent call last):
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/__init__.py", line 61, in extractUFO
    func(
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 72, in extractFontFromOpenType
    extractAnchors(source, destination)
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1106, in extractAnchors
    anchorGroups = _gatherAnchorDataFromLookups(gpos, scriptOrder)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1132, in _gatherAnchorDataFromLookups
    anchorsForThisLookup = _gatherAnchorsForLookup(gpos, lookupIndex)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1159, in _gatherAnchorsForLookup
    subtableAnchors = _handleAnchorLookupType4Format1(subtable)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1185, in _handleAnchorLookupType4Format1
    anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})
                                                                        ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'XCoordinate'
Traceback (most recent call last):
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/__init__.py", line 61, in extractUFO
    func(
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 72, in extractFontFromOpenType
    extractAnchors(source, destination)
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1106, in extractAnchors
    anchorGroups = _gatherAnchorDataFromLookups(gpos, scriptOrder)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1132, in _gatherAnchorDataFromLookups
    anchorsForThisLookup = _gatherAnchorsForLookup(gpos, lookupIndex)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1159, in _gatherAnchorsForLookup
    subtableAnchors = _handleAnchorLookupType4Format1(subtable)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/formats/opentype.py", line 1185, in _handleAnchorLookupType4Format1
    anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})
                                                                        ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'XCoordinate'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/js/Desktop/debug_david/.venv/bin/ndf_rasterizer", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/js/Desktop/debug_david/.venv/lib/python3.11/site-packages/rasterizer/rasterizer.py", line 310, in main
    extractUFO(input_file, ufo)
  File "/Users/js/Desktop/debug_david/extractor/Lib/extractor/__init__.py", line 75, in extractUFO
    raise ExtractorError(
extractor.exceptions.ExtractorError: There was an error reading the OTF file.
jansindl3r commented 1 month ago

and of course solves the issue, that that seems wrong? If you think it's a solve, can you do it or shall i PR it?

       if baseAnchor is not None:
            anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})
jenskutilek commented 1 month ago

Caveat: I’m not an expert in how mark positioning works ... but:

A BaseAnchor being None seems to be a valid case. In a ttx dump this is expressed width emtpy="1" like in this example:

            <BaseRecord index="3">
              <BaseAnchor index="0" Format="1">
                <XCoordinate value="138"/>
                <YCoordinate value="-40"/>
              </BaseAnchor>
              <BaseAnchor index="1" empty="1"/>
            </BaseRecord>

But while your solution would fix that correctly, it seems that extractor isn't handling all the intricacies of mark positioning well. E.g. the method you quoted only ever uses the first BaseAnchor of a BaseRecord, while certain marks would reference the other anchors of a BaseRecord via the Class value of their MarkRecord. So some anchors are lost, respectively marks are attached to a wrong anchor in the extracted file.

jansindl3r commented 1 month ago

Thanks for your reply! May I ask when would one want to have empty anchors? I only found this issue in not so well crafted fonts so I somehow assumed it's a mistake

what about something like this?

    for baseRecordIndex, baseRecord in enumerate(subtable.BaseArray.BaseRecord if subtableIsType4 else subtable.Mark2Array.Mark2Record):
        baseRecord = baseRecord.BaseAnchor if subtableIsType4 else baseRecord.Mark2Anchor
        for baseAnchor in baseRecord:
            if baseAnchor:
                anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})
rimas-kudelis commented 1 month ago

Caveat: I’m not an expert in how mark positioning works ... but:

A BaseAnchor being None seems to be a valid case. In a ttx dump this is expressed width emtpy="1" like in this example:

            <BaseRecord index="3">
              <BaseAnchor index="0" Format="1">
                <XCoordinate value="138"/>
                <YCoordinate value="-40"/>
              </BaseAnchor>
              <BaseAnchor index="1" empty="1"/>
            </BaseRecord>

But while your solution would fix that correctly, it seems that extractor isn't handling all the intricacies of mark positioning well. E.g. the method you quoted only ever uses the first BaseAnchor of a BaseRecord, while certain marks would reference the other anchors of a BaseRecord via the Class value of their MarkRecord. So some anchors are lost, respectively marks are attached to a wrong anchor in the extracted file.

Ahh, so you can have more than one class of marks in the same lookup, and attach them to different anchors on each base, or even not attach some at all.

Like in this case, same lookup is used for attaching marks above (like gravecomb, which is tagged as class 1 in lookup 2) and below (like dotbelowcomb, which is tagged as class 0 in the same lookup). And then some glyphs, like franc have attachment points for both classes of combining marks, while some other glyphs only have on of these points, but not both (like uni02F3 (MODIFIER LETTER LOW RING) only has the attachment point for marks below, tagged as class 0).

This definitely hadn't occurred in the font I'm tinkering with and was testing on, so yeah, I implemented the extraction incorrectly.

I guess I'll try fixing this properly now that I have a font to test on.

rimas-kudelis commented 1 month ago

@jansindl3r, could you check if my change linked to above works as intended?

I extracted JunigardenSwash successfully with it, and see one mark on uni02F3 and two marks on franc in the resulting UFO, but I guess it would be better if you checked as well.

If this works, I'll make a PR and it can be merged whenever maintainers have the time for it.

rimas-kudelis commented 1 month ago

Actually, I'm pretty certain that the patch works. I checked omacronbelow, which has five anchors in the OTF, and same five anchors are also visible in the UFO file. Still would love feedback from somebody else though.

jansindl3r commented 1 month ago

Thanks for the work! I checked with other font and there is still issue. I can't really share this font, but I will try to find another that causes this error

anchors["baseAnchors"].update({baseCoverage[baseRecordIndex]: {"x": baseAnchor.XCoordinate, "y": baseAnchor.YCoordinate}})
                                                                        ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'XCoordinate'
jansindl3r commented 1 month ago

sorry! I forgot to change branch. All good, works well, thanks!!

rimas-kudelis commented 1 month ago

Oh! I was going to have another look, now I'm glad I won't have to. :D

typemytype commented 2 weeks ago

Ive got users reporting, is this resolved?

 >> File "/Applications/RoboFont_4_official.app/Contents/Resources/lib/python3.12/extractor/formats/opentype.py", line 1170, in _gatherAnchorsForLookup
12/11/2024 14:19:28 >   OUTPUT > ROBOFONT >> File "/Applications/RoboFont_4_official.app/Contents/Resources/lib/python3.12/extractor/formats/opentype.py", line 1195, in _handleAnchorLookupType4Format1
12/11/2024 14:19:28 >   OUTPUT > ROBOFONT >> AttributeError: 'NoneType' object has no attribute 'XCoordinate'

I dont see any test added to extractor please add test as soon as possible @rimas-kudelis @jansindl3r

@benkiel next time, no merging without any added tests

jansindl3r commented 2 weeks ago

it was for the fonts I had problem with. It seems weird as in the current version line 1195 doesn't have _handleAnchorLookupType4Format1

typemytype commented 2 weeks ago

its inside the RoboFont4.5 release

test are needed! mistakes can happen but tests prevent this from happening, please add them

rimas-kudelis commented 2 weeks ago

Even if I had written a test when initially implementing anchor extraction, it would not have prevented this from happening because I wouldn't have written one correctly because I didn't wasn't aware of a font that would have failed such test (if I was, I would have implemented current logic from the get go).

Also, while I do understand that tests are a good thing, I can't honestly promise that I'll get to writing this missing test anytime soon (or indeed ever), sorry.

typemytype commented 2 weeks ago

euh dont understand...

those anchors comes from mark to base or mark to mark features, you just write those features, compile to binary and test against them...

Maybe we should revert this PR until tests are added.

rimas-kudelis commented 2 weeks ago

The testsuite of this package is currently very minimal (8 assertions in total), so if you go down the path of removing all code that is not properly tested, I'm afraid you may end up with very little code remaining here.

If you know how to quickly build a proper test for this feature, I would applaud you it if you did that. For me, it would likely take much more time, because I'm new to Python, Tox, and even fonts in general, and I'd rather spend that time polishing the font I wanted this feature for (and even that activity is on pause at the moment).

jansindl3r commented 2 weeks ago

I think that Rimas did a good job improving the tool (Thanks!), enabling it to process many more fonts than before. Previously, it would fail on almost every font I tested. Only a few, well-crafted fonts made it through. The one I shared here was just first font from dafont that I randomly tried to see if it fails.

rimas-kudelis commented 2 weeks ago

I think that Rimas did a good job improving the tool (Thanks!),

To be fair, I was also the one who broke the tool in the first place.