j0k3r / graby

Graby helps you extract article content from web pages
MIT License
362 stars 73 forks source link

tests: Fix pdf test after pdfparser 2.8.0 bump #348

Closed jtojnar closed 4 months ago

jtojnar commented 4 months ago

Before 2.8.0 release, pdfparser added a space at roughly the 51th character, incorrectly breaking some words:

Verdana : Nullam hendrerit ante sed risus luctus el ementum. Morbi consectetur <br />\n
et diam sed dignissim. Sed a erat metus. Mauris a u ltrices velit. Aenean laoreet <br />\n
lectus nisi, tincidunt auctor nunc dictum at. Pelle ntesque at enim ac arcu mattis <br />\n
pellentesque et et lectus. Pellentesque in augue ip sum. Vivamus sapien lorem, <br />\n
semper auctor ligula sit amet, aliquam imperdiet mi . Maecenas in neque in tellus <br />\n
sagittis feugiat ac non dolor. Ut adipiscing erat a c tortor fringilla, in lobortis orci <br />\n
gravida. Praesent vulputate neque ac nibh elementum  tempor. Etiam tincidunt <br />\n
aliquam libero, ut faucibus justo sodales sed. Aene an aliquam sodales nulla, vel <br />\n
mollis leo blandit at. Morbi vulputate tincidunt ve nenatis.

Updating pdfparser fixed this so let’s update the test. I have bisected it to https://github.com/smalot/pdfparser/commit/feaf39e73744953a0eabd9026ebe436d22e5f6ac

Bumping smalot to ≥ 2.2.0 adds ext-iconv requirement.