okfn / messytables

Tools for parsing messy tabular data. This is now superseded by https://github.com/frictionlessdata/tabulator-py
http://messytables.readthedocs.io/
387 stars 110 forks source link

Remove PDF functionality #186

Closed davidread closed 5 years ago

davidread commented 5 years ago

Optional dependency pdftables is unmaintained in 6 years.

This functionality - extracting tables from PDFs - seems like a 'nice to have' extra for messytables.

There is some issue with pdftables importing newer versions of numpy: https://github.com/ckan/ckanext-xloader/issues/79 I'm not interested to look into the details - no doubt there will be other things. 6 years with no commits is not a library we should use.

StevenMaude commented 5 years ago

For some background context, pdftables was released as an open source library by ScraperWiki. The open source version is unmaintained as mentioned by @davidread. The open source pdftables library is most likely Python 2 only unless someone fixes it up to work with Python 3 (and Python 2 is end of life soon). This may not be much work, but it it still work.

I can't merge this PR as no longer a collaborator on this repository. However, I had a quick skim over the diff and it looks reasonable to me.

NB: If there is demand and enthusiasm to reintroduce PDF table extraction as a feature, there are other currently active open source alternatives such as Camelot.

davidread commented 5 years ago

Many thanks for explaining @StevenMaude. That's good enough for a merge I think.