sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

Fix for font unicode character map where glyph is of the format /uniXXXX #102

Closed dannywinrow closed 2 years ago

dannywinrow commented 2 years ago

fixes #101

sambitdash commented 2 years ago

@dannywinrow,

Thank you for your submission.

Unfortunately, your submission is not compliant with PDF specifications. While creators can use any logic to name dictionary keys, the same cannot be incorporated into the reader. We will review the file to find if there are other font information in the file that we have overlooked in the implementations.

dannywinrow commented 2 years ago

Hi @sambitdash , I think you are mistaken. The part which you have failed to implement is the correct AGL specification which you can find here https://github.com/adobe-type-tools/agl-specification

You are currently only comparing to the AGL list, but you have missed the following two options one where the glyph prefix is uni and another where it is u followed by hex in groups of 4. These should be mapped to their unicode values.

Look at section 2 and you will find this:

"Otherwise, if the component is of the form ‘uni’ (U+0075, U+006E, and U+0069) followed by a sequence of uppercase hexadecimal digits (0–9 and A–F, meaning U+0030 through U+0039 and U+0041 through U+0046), if the length of that sequence is a multiple of four, and if each group of four digits represents a value in the ranges 0000 through D7FF or E000 through FFFF, then interpret each as a Unicode scalar value and map the component to the string made of those scalar values."

sambitdash commented 2 years ago

@dannywinrow,

Thanks for finding the complete AGFN specification. If you are comfortable implementing the complete specification with test cases feel free to resubmit the PR with all the details. I will be happy to merge it.

dannywinrow commented 2 years ago

It would take significantly more work to include the full AGL specification, since a glyph name can refer to a string of characters as well as an individual unicode character. However, the Dictionary currently used for extracting text is a CosName to Char dictionary.

As I'm not an experienced programmer I don't think I'd be able to achieve this. I have added support in my own fork for single character /uni and /u glyphs and will just leave the issue open for now.

This pull request will not satisfy what is needed for the full AGL specification and so I am closing it. Thanks.