Open nate-bush opened 4 years ago
I can replicate this issue with the newest version of pdfminer.six. Tried cleaning the pdf with mutools and running the code again, but no difference.
Hi, I'm using pdfminer.six-20200726.
I have another question regarding to cases when the font type is "unknown". As I understand, if the LTChar has a font type "unknown", it will have a neglectable height beside a proper width. Is there any way to recover the character heights, as well? Why is not it implemented already?
Thank you!
@rinczefi Are you able to share your PDF? Sounds like we'd need to work out why the font is unknown
...
unknown.pdf This is the PDF, I'm stuck with right now. Thanks in advance.
@jstockwin Is there any progress on this issue yet?
@rinczefi Apologies for not responding sooner. Unfortunately I am currently quite busy and so have not had much spare time for open source stuff. It's on my list of things to get around to, but no guarantees, I'm afraid.
@rinczefi were you able to solve this issue?
@rinczefi were you able to solve this issue?
No, I were not.
@jstockwin @pietermarsman any updates on this issue or at least what is the root cause of this issue? It would be wonderful if you can shed some lights on the root cause. Thanks in advance.
Bug report
Description: Height of character boxes is not correct on some fonts. I removed other font and graphical items from the PDF to isolate the problematic character boxes.
Steps to reproduce:
pdf_with_boxes.png
to see the boxes.chinese_chars_with_incorrect_char_boxes.pdf