When characters are rendered off the page, don't include them in the extracted text.
Ideally this would be the CropBox rather than MediaBox, but I don't have easy access to that in PageLayout and some coming refactors will make that easier to achieve. This is a good start
I don't have a sample PDF to use in an integration test, so I've added a pending spec.
When characters are rendered off the page, don't include them in the extracted text.
Ideally this would be the CropBox rather than MediaBox, but I don't have easy access to that in PageLayout and some coming refactors will make that easier to achieve. This is a good start
I don't have a sample PDF to use in an integration test, so I've added a pending spec.