jsreport / jsreport-pdf-utils

jsreport extension providing pdf operations like merge or concatenation
MIT License
8 stars 4 forks source link

Error: EOL expected but not found #28

Open LubomirIgonda1 opened 4 years ago

LubomirIgonda1 commented 4 years ago

Hi, I tried to append PDF but I got "EOL expected but not found" error. I found the same issue here https://github.com/rkusa/pdfjs/issues/109 what is package that you forked. I can't provide the pdf file for now but I would like to help to solve this issue. If you need some debug data let me know.

I think the issue start in parse function here https://github.com/jsreport/pdfjs/blob/master/lib/object/xref.js where skipEOL funciton is called

If I commend the error here https://github.com/jsreport/pdfjs/blob/master/lib/parser/lexer.js on line 56

this._error('EOL expected but not found')

It success append the PDF but on 1/3 of appended page was colors transform on kind of negative colors.

ENV:

node: 12.18.3
"jsreport-assets": "1.6.0",
"jsreport-chrome-pdf": "1.8.0",
"jsreport-core": "2.9.0",
"jsreport-ejs": "2.2.0",
"jsreport-pdf-utils": "1.8.0",
"jsreport-static-pdf": "0.4.0",

my code looks like

jsConfig.template.pdfOperations = [{
        type: 'append',
        template: {
            engine: 'none',
            recipe: 'static-pdf',
            staticPdf: {
                pdfAsset: {
                    content: bufferPDFFile,
                    encoding: 'binary'
                }
            }
        }
    }] 
pofider commented 4 years ago

Hi, thank you for the bug report. It would really help to have a particular pdf. Maybe you can email it confidentially to me?

In general, the current pdf parser doesn't support linearized pdfs and can have issues with a variety of pdf specific aspects. It works always with pdf produced by jsreport-chrome-pdf but external pdfs are sometimes problematic. We have in backlog improving this area, but bigger improvements like support for linearized pdfs won't be a priority this year I am afraid.

LubomirIgonda1 commented 4 years ago

I am working on getting test PDF for debug purpose but I need some time. I checked the pdf and it is not the linearized pdf so it will be a different case. But I'd like to ask if you have or know some list of know pdf specific aspects whitch are issues durring appending process. thanks

pofider commented 4 years ago

Unfortunately, I don't have a list of things that breaks the parser. Just remember that it happened a few times in history. It's a rare case that someone uses his own existing pdf so we don't hit this often. Nothing in the past months if I remember right.

LubomirIgonda1 commented 4 years ago

I sent to you the test PDF on email. If you need more information or some help let me know

pofider commented 4 years ago

Thank you for the pdf. I was able to replicate the problem. Unfortunately, I wasn't able to figure out how to fix it after some hours of investigation. I have no clue why the colors got transformed. We have now different priorities and I cannot invest more into this at this moment, unfortunately. I Will get to it later hopefully, but not soon I am afraid.

LubomirIgonda1 commented 4 years ago

Thank you for your effort. Do you know at least why the error ('EOL expected but not found' ) was trigger ?

LubomirIgonda1 commented 3 years ago

@pofider Hi I finally figure out where is the issue but I am not able to fix it cause I do not fully understand why the part of code was added against original code here is a diff link of core package that you are using https://github.com/rkusa/pdfjs/compare/2.4.2...jsreport:master

the issue code is toString function in file lib/object/string.js When I changed the code the PDF merge correctly. I figure out that the specially issue is a converting string to utf16le buffer

If I can help somehow fix this issue let me know

image

pofider commented 3 years ago

Thank you. Don't know why the change, from the top of my head, but will check on it and come back.

pofider commented 3 years ago

I tried to check the error and it seems to be gone in the latest jsreport@2.11.0 (jsreport-pdf-utils@1.10.1) Could you try it with the latest?

LubomirIgonda1 commented 3 years ago

@pofider Hi I updated all jsreport packages to newest version and yes the error "EOL expected but not found" disappear, but the issue with the bad color schema after pdf merged is still stay. Or more specific the string transformation to utf16le for some cases not working properly.

here is result after pdf merge: image

pofider commented 3 years ago

Yes, I see the same. I will have to read through the pdf spec, because I don't fully what I am doing.

I took the code for serializing string here https://github.com/foliojs/pdfkit/blob/e641a785082b80c0f88e04ddcab04e3c726ea6b4/lib/object.js#L60

Because the simple asci based toString breaks the values with national characters. This can be replicated if you use pdf meta and put there something like "čřšččšžýz".

I think I fixed it now in this commit. The idea is that I do the extra serializing work for unicoded value only when the value comes from user input and not from the existing pdf. The colors in your pdf don't change afterward. Unfortunately this has a really fragile knowledge background. So if you study and understand the string values in pdf spec, please share.

LubomirIgonda1 commented 3 years ago

I can confirm that your solution works great, but unfortunately I don't have any deeper knowledge from PDF background. So probably I will not help you anymore. But if you can provide some materials I could study this topic.

pofider commented 3 years ago

We will likely release it in v3 beta and wait for potential complaints, but that will take some months till we get there...

There are some basic articles about pdf format, but in the end, you anyway need to read the pdf spec https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

LubomirIgonda1 commented 3 years ago

Hi I tried to use in full context with encrypting but the issue with colors shows again. I set password like

pdfPassword: {
    active: true,
    password: '123'
}

If I turn off encrypting it works nice So I think that the issue is probably in encryptFn:

jsreport-pdfjs\lib\object\string.js

image

but I wasn't able to figure out what exactly is the issue in this case.

If you need more information let me know

pofider commented 3 years ago

I see. I tried to check it out, but unfortunately, I have no clue what is wrong. The security code is taken from outside, I am not familiar with it at all.

The only possible approach here seems to replicate the problem using the library where the code was taken from. And ask them for help. Unfortunately, I am not sure if/when I get time for this.