opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
255 stars 69 forks source link

Extract amounts of money #156

Closed opensemanticsearch closed 2 years ago

opensemanticsearch commented 2 years ago

Additional to machine learning NER approach add ontology- and regex-based extraction of amounts of money (https://github.com/opensemanticsearch/open-semantic-search/issues/399).

Mandalka commented 2 years ago

Plugin: https://github.com/opensemanticsearch/open-semantic-etl/blob/master/src/opensemanticetl/enhance_extract_money.py

Docs: https://github.com/opensemanticsearch/open-semantic-search/blob/master/docs/doc/analytics/money/README.md

Mandalka commented 2 years ago

Todo: Recognize if number parts as string like "thousand", "million" and "billion".

Mandalka commented 2 years ago

Integrated https://github.com/jaidevd/numerizer