Open kurdtpage opened 7 years ago
+1
+1
This happens to me with PDF files generated with MS Word. A dirty fix is to change a line in the function getText of the Object class.
if ($current_position_tm['y'] !== false) {
$delta = abs(floatval($y) - floatval($current_position_tm['y']));
if ($delta > 10) {
$text .= "\n";
}
}
After some debugging $delta was sometime 0, sometime >7, so changing the test to ($delta > 7) correctly adds the newlines.
I imagine this is due to specific font issues so the correct number might vary and this is not going to be a permanent fix, but it might help you in converting word-generated pdfs.
The output of pdfparser is a string that is 1 long line of text. There are no line endings (CR, LF, \r, \n, etc.) even when there are clear line terminations in the PDF