Fix text carrier-over between pages. Fix tests.

timClicks / slate

The simplest way to extract text from PDFs in Python

http://timmcnamara.co.nz/

GNU General Public License v3.0

428 stars 139 forks source link

Closed jasco closed 9 years ago

jasco commented 9 years ago

Fixed tests by including newlines from extracted documents
Added test to illustrate merge bug from commit 43614f2716fb6020d31bfde64712636b084c417b in which uncleared buffer carries text between pages
Added clearing of buffer that also addresses issue #21 in the prior codebase

timClicks commented 9 years ago

Nice work. Thanks for this!