Closed iGrog closed 1 month ago
My guess would be an unbalanced set of q
and Q
commands in the document stream causing this. But I've been wrong before! @iGrog, can you please send the offending PDF to bhuisman at greywyvern dot com? I'd appreciate a look. Thanks.
@GreyWyvern Thanks. PDF was sent to your email
Thanks! It turns out this PDF has an inline image object which is fouling up the parser in formatContent()
. The parser removes strings, but it should be removing these inline images too. I'll work on a solution for this.
@iGrog can you verify that the code from #693 resolves your issues? I've been using the "fixed" code for several weeks now and haven't had any issues myself, so I'd like to switch it out from being a draft. Thanks!
@GreyWyvern I've checked parsing dozens of PDF files, and all of them succeeded (including those that used to crash due to NRE). Looks like it's working :) Thank you!
Description:
An exception
![image](https://github.com/smalot/pdfparser/assets/842588/7d08d74b-bd3f-4beb-ada3-25fcb360c0f8)
Trying to access array offset on value of type null
was thrown onPDFObject.php
line795
$current_position_cm is null
PDF input
Not allowed to put pdf in public, but can share it privately.
Expected output & actual output
Expected output: to get text Actural output: Exception was thrown
Code