opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

Move technical metadata fieldnames to separated config #117

Closed opensemanticsearch closed 4 years ago

opensemanticsearch commented 4 years ago

Move list of technical metadata fieldnames like pdf:docinfo:creator_tool and so on from plugin enhance_multilingual to separated blacklist config, so reusable by other plugins like plugin core method gettext()

Mandalka commented 4 years ago

Separated fieldname config to /etc/opensemanticsearch/blacklist/textanalysis/

Mandalka commented 4 years ago

These blacklists now used by text analysis plugins based on plugin core, too.