vandeseer / easytable

Small table drawing library built upon Apache PDFBox
MIT License
239 stars 91 forks source link

U+202C Unicode error #152

Closed SkylerWittman closed 6 months ago

SkylerWittman commented 1 year ago

When building a PDF document, some of our customer datasets are receiving an error. However it's not all customer datasets, I cannot figure out what exactly is yielding this error. This only happens with some customer datasets (we pull a List in Java of customer name, email, phone number and then create rows based on each element in that list).

The U+202C character from what I've understood through Googling is a character to determine reading direction of characters (think right to left for Arabic etc.). But at no point in our code do we reverse the orientation of text, we are using basic configuration.

java.lang.IllegalArgumentException: U+202C ('afii61573') is not available in the font Helvetica, encoding: WinAnsiEncoding
    at org.apache.pdfbox.pdmodel.font.PDType1Font.encode(PDType1Font.java:428)
    at org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:333)
    at org.apache.pdfbox.pdmodel.PDPageContentStream.showTextInternal(PDPageContentStream.java:514)
    at org.apache.pdfbox.pdmodel.PDPageContentStream.showText(PDPageContentStream.java:476)
    at org.vandeseer.easytable.drawing.DrawingUtil.drawText(DrawingUtil.java:20)
    at org.vandeseer.easytable.drawing.cell.TextCellDrawer.drawText(TextCellDrawer.java:106)
    at org.vandeseer.easytable.drawing.cell.TextCellDrawer.drawContent(TextCellDrawer.java:57)
    at org.vandeseer.easytable.TableDrawer.lambda$new$0(TableDrawer.java:69)
    at org.vandeseer.easytable.TableDrawer.drawRow(TableDrawer.java:200)
    at org.vandeseer.easytable.TableDrawer.drawWithFunction(TableDrawer.java:183)
    at org.vandeseer.easytable.TableDrawer.lambda$drawPage$1(TableDrawer.java:91)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at org.vandeseer.easytable.TableDrawer.drawPage(TableDrawer.java:90)
    at org.vandeseer.easytable.TableDrawer.draw(TableDrawer.java:157)
// THIS IS WHERE OUR INTERNAL CODE STARTS:
    at com.croogloo.metadata.CGCrewList.generateCrewListPDF(CGCrewList.java:344)
    at com.croogloo.webAPI.PersonAPI_V2.generateCrewListPDF(PersonAPI_V2.java:348)

Here is the line of code that yields the error:

TableDrawer.builder().table(table).startX(50.f).endY(50.f).build().draw(() -> document,
          () -> new PDPage(PDRectangle.A4), 25.f);

I would appreciate any help on figuring out how to resolve this error. Thank you so much!

vandeseer commented 1 year ago

Hi @SkylerWittman,

The problem is not an issue of the library itself, but with the font used. You would get the exact same error if you only used PDFBox, but not easytable.

I would recommend either filtering out special chars like that one programmatically or alternatively asking around on stackoverflow (for instance) about what font to use.

Hope this helps Stefan

SkylerWittman commented 1 year ago

Hi @vandeseer,

Thank you for your answer. I didn't think about stripping the character out, I don't know if that would have any negative repercussions (I mean, it doesn't work currently so not much more negative than that!)

I've tried swapping to Times font and got the same problem, I will try the character stripping approach.

Cheers!

alexanderthorn commented 10 months ago

Hi @SkylerWittman I've the same problem with greek symbols like "delta", "gamma", etc. or superscript and subscript numbers (⁰ ¹ ² ³ ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ₀ ₁ ₂ ₃ ₄ ₅ ₆ ₇ ₈ ₉ ), but, as told by @vandeseer it's not an issue of the library but, I suppose, a problem of the PDFBox.

Did you find a method, a solution for the problem or any suggestion? Thanks in advance, Bye

SkylerWittman commented 10 months ago

Hey @alexanderthorn. I stopped working on this project so I have no updates sadly. I'll leave it open in case you or someone else solves it.

Good luck