Add text extraction benchmark for paragraph recognition

py-pdf / benchmarks

Benchmarking PDF libraries

BSD 3-Clause "New" or "Revised" License

227 stars 11 forks source link

Open MartinThoma opened 2 years ago

MartinThoma commented 2 years ago

The current text extraction benchmark does not tell anything about how well newline characters are recognized. We need a new benchmark for that.