wri-dssg-omdena / policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.
Other
34 stars 9 forks source link

Identify best set of keywords and search terms to find relevant documents from Ecolex #1

Closed thefirebanks closed 3 years ago

thefirebanks commented 3 years ago

After the text has been extracted from the policy PDFs:

  1. Find word embeddings suitable for Spanish documents, or any type of Spanish language model
  2. Use keyword analysis/topic modeling to gather insights from the text and improve further searches
  3. If possible, come up with a "similarity" or "distance" metric among relevant documents for easier filtering from non-relevant
thefirebanks commented 3 years ago

No needed anymore