timClicks / slate

The simplest way to extract text from PDFs in Python
http://timmcnamara.co.nz/
GNU General Public License v3.0
428 stars 139 forks source link

Fix text carrier-over between pages. Fix tests. #23

Closed jasco closed 9 years ago

jasco commented 9 years ago
  1. Fixed tests by including newlines from extracted documents
  2. Added test to illustrate merge bug from commit 43614f2716fb6020d31bfde64712636b084c417b in which uncleared buffer carries text between pages
  3. Added clearing of buffer that also addresses issue #21 in the prior codebase
timClicks commented 9 years ago

Nice work. Thanks for this!