Font attributes incorrect even when font is properly identified (`is_italic`, `is_serif`, etc.)

The blocks output format includes various font attributes on the word level, including is_italic and is_serif. These do not appear to be functioning properly, and seem to always return false, even when using the Legacy model and when font identification worked correctly.

For example, when running recognition with the Legacy engine on the image below, the font is correctly recognized as an italic/serif font (Times_New_Roman_Italic). However, despite this, the is_italic and is_serif attributes are both false. italic_example_1

If this is an issue with Tesseract.js/Tesseract.js-core we should fix. If it is an issue on the Tesseract side, where the information is always incorrect, these should be removed from our output to avoid confusion (in the next major version).

Note that this is distinct from general accuracy issues with Tesseract font recognition, or the fact that it only runs on Legacy, which are outside of the scope of this repo. This issue is specific to cases where Tesseract correctly identifies the font but is still returning the wrong font attributes.

naptha / tesseract.js

Font attributes incorrect even when font is properly identified (`is_italic`, `is_serif`, etc.) #907