py-pdf / benchmarks

Benchmarking PDF libraries
BSD 3-Clause "New" or "Revised" License
225 stars 11 forks source link

Add text extraction benchmark for paragraph recognition #2

Open MartinThoma opened 2 years ago

MartinThoma commented 2 years ago

The current text extraction benchmark does not tell anything about how well newline characters are recognized. We need a new benchmark for that.