mandiberg / printwikipedia

30 stars 11 forks source link

Helvetica not embedded errors #4

Open mandiberg opened 11 years ago

mandiberg commented 11 years ago

PDF output contains text which is styled with the default helvetica font, though helvetica is not embedded in the PDF. This results in upload errors when uploading to Lulu.com. EITHER, those sections need to be restyled to use the Cardo font (which is embedded) or Helvetica needs to be properly embedded to make the errors go away.

There are two known patterns:

  1. The whitespace after the <hr> at the end of articles. This whitespace is constructed via two lines of code. In PdfPageWrapper.java, currently on line 90: mct.addElement(new Phrase("\n")); And in WikiHtmlConverter.java, currently on line 22: output = headerToUppercase(output) + "<hr width='100%'/>; I have confirmed that removing the line 90, and removing the added hr string on line 22 causes the removal of these Helvetica sections after the end of articles. My attempts to fix this have been unsuccesful. I have tried setting the extra space after the articles via setSpacingBefore and setSpacingAfter (lines 85 & 86) but that doesn't do anything. I tried to assign a font to the "\n" string via _wikiFontSelector.getTitleFontSelector().process but that didn't work either (lines 92-98). I also tried to concatenate the <hr> into the string that is the article that gets parsed by convertHtml2Pdf, but that didn't work either!
  2. The square bullet that follows the list of the contents of longer articles. I have not attempted to fix this problem.

How to view the unembedded fonts:

You can see the font information for a PDF document with Adobe Acrobat Pro. Bring up the properties via File > Properties and select the Font tab the third one from the left. This will show all embedded and un embedded fonts.

I am able to view missing font location with Adobe Acrobat Pro. It works better in v10, though I am using v9. You do this by opening Advanced > Preflight. Click the Options pulldown in the upper right and select Create New Preflight Profile. Give your profile a name, select Fonts in the left panel, and Font is not embedded in the main panel, click Save and OK. With the PDF open, click Analyze in the Preflight window, and it will tell you the number of incidents and the pages. In v10, it will actually show you the incidents with paragraph granularity.

screen shot 2013-05-30 at 12 29 29 pm

mandiberg commented 8 years ago

It seems that this is an established error for iText. Maybe we don't have the right kind of helvetica, or haven't attached it correctly.

Our helvetica isn't TTF, it is Type-1. TTF is here: https://github.com/netshade/achievement_unlocked/tree/master/fonts But when we tried that, we get an error telling us we can't embed fonts we don't have the license for.

Confirmation that all fonts need to be embedded is here, with code (in VB) that may be helpful: http://stackoverflow.com/questions/19315917/all-the-fonts-must-be-embedded-this-one-isnt-helvetica

Here is an example of someone struggling with maybe the same problem: http://stackoverflow.com/questions/12093236/how-to-get-rid-of-helvetica-in-itext-xmlworker

Also, these may provide routes to analyze the documents programmtically: http://stackoverflow.com/questions/3631152/how-can-you-find-a-problem-with-a-programmatically-generated-pdf

And maybe Helvetica is a special case where "I did some digging into source code and it seems that iText explicitly ignores BaseFont.EMBEDDED flag for certain fonts and Helvetica is one of them." http://stackoverflow.com/questions/2019607/how-to-embed-helvetica-font-in-pdf-using-itext/6109447#6109447