robbi5 / kleineanfragen

Collecting kleine Anfragen from Parlamentsdokumentationssystemen for easy search- and linkability
https://kleineanfragen.de
MIT License
43 stars 9 forks source link

Improve table recognition #96

Open robbi5 opened 8 years ago

robbi5 commented 8 years ago

Currently, table recognition is a simple check for some keywords in app/jobs/contains_table_job.rb.

We could improve this by looking for some obvious table like patterns like:

November 2013 43.104 

Dezember 2013 30.419 

Januar 2014 29.218 

Februar 2014 15.598 

(from https://kleineanfragen.de/berlin/17/14442)

Additionally we could use the table recognition from tabula: tabula-extractor / tabula-java