wri-dssg-omdena / policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Other
34 stars 9 forks source link

Extract text from Cristina's documents #2

Closed dfhssilva closed 3 years ago

dfhssilva commented 3 years ago
  1. Read pdf files directly from OneDrive or from zip file
  2. Use OCR if needed for extracting text from image pdf files
  3. Optional: Structure data into single file (or database) for future reading