Open robbi5 opened 8 years ago
Some papers have broken/unsearchable text, because some pages should have been rotated before extracting.
Example: https://kleineanfragen.de/schleswig-holstein/18/406 Extracted Text: https://kleineanfragen.de/schleswig-holstein/18/406-gremienmitgliedschaften-der-regierungsmitglieder-und-staatssekretaere.txt
Fr ag e n 1 ,3 u n d 4 : G re m ie n im S in n e d
Apache TIKA Bug: https://issues.apache.org/jira/browse/TIKA-723 "Rotated text isn't extracted correctly from PDFs"
Some papers have broken/unsearchable text, because some pages should have been rotated before extracting.
Example: https://kleineanfragen.de/schleswig-holstein/18/406 Extracted Text: https://kleineanfragen.de/schleswig-holstein/18/406-gremienmitgliedschaften-der-regierungsmitglieder-und-staatssekretaere.txt