ssimms / pdfapi2

Create, modify, and examine PDF files in Perl
Other
15 stars 20 forks source link

illustrate utf8 pitfalls #20

Open westmj opened 4 years ago

westmj commented 4 years ago

corefonts do not contain many utf8 characters e.g. accents: á é í; ES: ñ Ñ ¿; etc. True Type fonts contain these, BUT encoding cn be tricky.

use utf8; # is helpful in sending characters to PDF

coveralls commented 4 years ago

Coverage Status

Coverage decreased (-0.003%) to 56.839% when pulling af8809e207bbd28328e7d41d7f1d7ddeda2c6fac on westmj:mwjwest03 into fcc73b15b2e1b837a42689e1294be6868107e8b8 on ssimms:master.

PhilterPaper commented 4 years ago

I'm not sure what you're getting at by saying "corefonts do not contain many utf8 characters". You need to understand that core fonts, no matter what the physical font file is (even .ttf), are limited to a single byte encoding. By default, this is Latin-1 (or, extended to Windows-1252). You can't specify 'utf8' as the encoding for loading corefonts -- it's limited to 224 characters at most, and text must be specified as single byte in the chosen encoding. Naturally, UTF-8 encoding (ttfont method) has a much larger space for characters. If you want to use something beyond the basic Latin-1ish set, you have to use ttfont instead of corefont. psfont suffers from the same single-byte limitation.

By the way, you're misusing the "Pull Request" system here. Such an example of code illustrating a problem should be attached to a problem ticket. PRs are for code changes that you feel should be incorporated into the product code.

westmj commented 4 years ago

Thanks for your comments. I am new at this.

Sorry I can not document it, but somewhere I recall I saw that code, to use utf in a corefont. I understand it does not work, my comment was trying to point that out.

On 10/27/19, Phil Perry notifications@github.com wrote:

I'm not sure what you're getting at by saying "corefonts do not contain many utf8 characters". You need to understand that core fonts, no matter what the physical font file is (even .ttf), are limited to a single byte encoding. By default, this is Latin-1 (or, extended to Windows-1252). You can't specify 'utf8' as the encoding for loading corefonts -- it's limited to 224 characters at most, and text must be specified as single byte in the chosen encoding. Naturally, UTF-8 encoding (ttfont method) has a much larger space for characters. If you want to use something beyond the basic Latin-1ish set, you have to use ttfont instead of corefont. psfont suffers from the same single-byte limitation.

Thanks! This seems like a misunderstanding that I and others have.

By the way, you're misusing the "Pull Request" system here. Such an example of code illustrating a problem should be attached to a problem ticket. PRs are for code changes that you feel should be incorporated into the product code.

Thanks for the advice. Nothing was really broken, the API2 code works, it just is not very apparent to me how to get many characters. I am only trying to improve documentation. I did not understand that this is inappropriate.

This example code serves as documentation of some "gotcha" that took me a long time to figure out, and I saw others struggling with as I searched for answers. Perhaps the concept(s) belong in the pod?

I hope that if you think the submission will help others navigate "use utf8;" and selecting a suitable font, a double whammy, you incorporate it in the examples, or the ideas in the pod, or otherwise provide help of some sort.

Thanks for your work on PDF::API2. An amazing module.

-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/ssimms/pdfapi2/pull/20#issuecomment-546697717

-- Mike West mwjwest@gmail.com

Phone & WhatsApp: +1-302-559-3642 Address: Casa Alquimía 350 m SUR y 80 m ESTE de la Gasolinera Monteverde Monteverde, Puntarenas, Costa Rica 60109 www.facebook.com/mwjwest mwjwest.wordpress.com

PhilterPaper commented 4 years ago

Well, anyway you now know better when to open a ticket and attach a sample program, and when a PR is appropriate.

By the way, for PDF::API2 the primary ticket system is on CPAN (RT), and if you don't have a CPAN account you can email bug-PDF-API2 [at] rt.cpan.org, with the desired subject line. To add a comment to that thread, just email bug-PDF-API2 [at] rt.cpan.org with subject line [rt.cpan.org #NNNNNN]. NNNNNN is the assigned ticket number (e.g., 130722) and note 1 space between org and #, and the [ ] around the whole subject. Nothing else. If you don't follow this format carefully, you will end up creating a new bug report! HTML and other markup does not work with RT. I'm not sure if you can attach a file when using the email interface.

I agree that the documentation in PDF::API2 is weak and haphazard; I have tried to improve upon it in PDF::Builder (a fork of API2). I look at API2 bugs primarily to find things to fix in Builder (I don't fix things in API2 -- that's handled by Steve). The bottom line seems to be that you should use TTF or OTF (ttfont) fonts if you need more than the standard Latin-1 character set. If Latin-1 or other single byte encodings are OK for you, you can use corefont or psfont if you want. Good luck!

westmj commented 4 years ago

Rookie perl question... How can I use PDF::Table in PDF::Builder?

Your advice about how to file tickets for PDF::API2 makes sense. I just filed an "issue" for PDF::Table, that you might be interested in... https://github.com/kamenov/PDF-Table/issues/50 . I say this, because it would be great if PDF::Builder could make tables... but/since PDF::Table seems to have a bug that makes it unusable for my primary need.

Is this a suitable way to contact you? I am at m w j west at that popular web mail provider...

Ciao.

Thanks.

Mike West mwjwest@gmail.com

Phone & WhatsApp: +1-302-559-3642 Address: Casa Alquimía 350 m SUR y 80 m ESTE     de la Gasolinera Monteverde Monteverde, Puntarenas, Costa Rica  60109 www.facebook.com/mwjwest mwjwest.wordpress.com

On Sun, Oct 27, 2019, 6:33 PM Phil Perry notifications@github.com wrote:

Well, anyway you now know better when to open a ticket and attach a sample program, and when a PR is appropriate.

By the way, for PDF::API2 the primary ticket system is on CPAN (RT), and if you don't have a CPAN account you can email bug-PDF-API2 [at] rt.cpan.org, with the desired subject line. To add a comment to that thread, just email bug-PDF-API2 [at] rt.cpan.org with subject line [rt.cpan.org #NNNNNN]. NNNNNN is the assigned ticket number (e.g., 130722) and note 1 space between org and #, and the [ ] around the whole subject. Nothing else. If you don't follow this format carefully, you will end up creating a new bug report! HTML and other markup does not work with RT. I'm not sure if you can attach a file when using the email interface.

I agree that the documentation in PDF::API2 is weak and haphazard; I have tried to improve upon it in PDF::Builder (a fork of API2). I look at API2 bugs primarily to find things to fix in Builder (I don't fix things in API2 -- that's handled by Steve). The bottom line seems to be that you should use TTF or OTF (ttfont) fonts if you need more than the standard Latin-1 character set. If Latin-1 or other single byte encodings are OK for you, you can use corefont or psfont if you want. Good luck!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.[image]