Open zx8754 opened 7 years ago
I can reproduce. I wonder if extract_tables gets confused by the header lines. It would be nice if this worked automatically, since the PDF indeed is pretty clean. My guess is that this is an upstream issue (https://github.com/tabulapdf/tabula-java/) but I'd be happy if I were wrong.
I just wanted to note that you could set the area argument of extract_tables. I know that's not ideal, but better than doing it interactively for all of the pages.
There's an update of Tabula that was released last week, which apparently includes a number of fixes and improvements. It is going to require a bit of work to integrate, but I will revisit this once those I have the new version working to see if that solves this.
@leeper @zx8754
Something similar happening with me. I can read the tables in the PDF file but only header values are read and not the table contents.
Any suggestion on how to solve this?
This doc. PDF https://www.qc.cuny.edu/About/Research/Documents/Fact_Book_2014-2015_Final.pdf
my problem is don't extract an table on page 86 while the others pages extract_tables it works normally
Any solution?
I have below PDF, which seems to have "clean" tables. But extract_tables() gives me an empty list. http://databank.worldbank.org/data/download/GDP.pdf
I tried to use extract_areas, which works fine.
Any pointers why wouldn't extract_tables work? Maybe I missing some arguments?