pyexcel / pyexcel-pdfr

Read tables in pdf files using camelot for pyexcel community
Other
2 stars 0 forks source link

pdftables and pdfminer #3

Closed jayvdb closed 4 years ago

jayvdb commented 5 years ago

The dependency on pdftables and pdfminer is problematic, as it is Python 2 only. pdftables.six and pdfminer.six provides Python 3, however the bigger problem is that the supplier of pdftables was ScaperWiki, and they deleted their repo, and the tests rely on test data which has vanished, and the package contains some very junky modules which reference specific filenames, like experimental code referring to missing test data (see pdftables/runtables.py pdftables/TableFinder.py).

Anyways,

  1. https://build.opensuse.org/package/show/home:jayvdb:pyexcel/python-pdftables.six
  2. https://build.opensuse.org/package/show/home:jayvdb:pyexcel/python-pdfminer.six
  3. https://build.opensuse.org/package/show/home:jayvdb:pyexcel/python-pyexcel-pdfr
chfw commented 4 years ago

switching to camelot-py now. please have a go!