nlpforhealthcare18 / nlpforhealthcare18.github.io

This repo is to support collaborative code development for exploratory NLP analysis for health and social care research.
0 stars 4 forks source link

fix columns from pdf scraper #15

Open bailey-r opened 6 years ago

bailey-r commented 6 years ago

pdfminer appears to read text in columns (not rows) some text appears out of place eg was the service safe? - the rating is further down the string

looking into fix with tabular library... https://github.com/chezou/tabula-py