Closed shm007g closed 3 years ago
Hi @shm007g Appreciate your interest in the library. This is not a bug of the library but rather because the PDF is not correctly formed (as per the PDF specification). You may try repairing the PDF using ghostscript and then using it. You may run the following command to do so
gs -o output.pdf -sDEVICE=pdfwrite input.pdf
I repaired the PDF and you can download it from here.
Thanks very much! Can you tell me how you repaire? I Have a can't find cid font
error. Not fix this for one day.
gs -o output.pdf -sDEVICE=pdfwrite input.pdf
Page 1
Can't find CID font "����".
Attempting to substitute CID font /Adobe-GB1 for /����, see doc/Use.htm#CIDFontSubstitution.
The substitute CID font "Adobe-GB1" is not provided either. attempting to use fallback CIDFont.See doc/Use.htm#CIDFontSubstitution.
Loading a TT font from /usr/local/Cellar/ghostscript/9.53.3_1/share/ghostscript/9.53.3/Resource/CIDFSubst/DroidSansFallback.ttf to emulate a CID font Adobe-GB1 ... Done.
Can't find CID font "SimSun".
Attempting to substitute CID font /Adobe-GB1 for /SimSun, see doc/Use.htm#CIDFontSubstitution.
Yes, I also received the same message but I was able to extract all the text still so I think the output file should be good to go. Did you find any issues with it?
Some tokens lost or changed to unvalid one. I also tried merge command like
gs -q -sDEVICE=pdfwrite -dBATCH -sOUTPUTFILE=${line%.*}_mod.pdf -dNOPAUSE "${line}
same problem but does not have this font errors.
Describe the bug
Open regular pdf file, it ends with no root object error.
Code to reproduce the problem
PDF file
20000101.pdf
Expected behavior
get real text data from pdf file.
Actual behavior
error happens.
Screenshots
Environment