slovak-egov / CRZ-scraper

Web scraping and filtering code for slovak contract database - crz.gov.sk. The code downloads XML databases, creates a CSV database of contracts, filters them, downloads the files, extracts and cleans up tables with MD rates.
5 stars 2 forks source link

Extraction of data encoded in natural language #4

Open mtihanyi opened 2 years ago

mtihanyi commented 2 years ago

The procesed files are contracts. Their structure, wording and even particular paragraphs vary significantly, so it would be interesting to utilise artificial intelligence understanding natural language to extract additional data from the contracts, such as: