opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
255 stars 69 forks source link

Plugin core class #112

Closed opensemanticsearch closed 4 years ago

opensemanticsearch commented 4 years ago

Fore more DRY and reusable code (in growing count of plugins) and easier development of new plugins, migrate often used code parts and functionality like filtering for/running on certain content type or get text data from all fields for text analysis to a plugin core class which can be inherited.

Mandalka commented 4 years ago

Implemented Plugin core class in etl_plugin_core.py

Mandalka commented 4 years ago

Migrated enhance_pdf_ocr to inherit from plugin core.

Mandalka commented 4 years ago

Migrated enhance_pdf_page to inherit from plugin core.

Mandalka commented 4 years ago

Migrated enhance_pdf_page_preview to inherit from plugin core

Mandalka commented 4 years ago

Migrated enhance_warc to inherit from plugin core

Mandalka commented 4 years ago

Migrated enhance_pst to inherit from plugin core

Mandalka commented 4 years ago

Migrated enhance_entity_linking to inherit from plugin core.

Mandalka commented 4 years ago

Migrated enhance_extract_email to plugin core lib.

Mandalka commented 4 years ago

Migrated enhance_regex to plugin core lib.

Mandalka commented 4 years ago

Migrated enhance_extract_phone to plugin core lib.

Mandalka commented 4 years ago

Migrated enhance_extract_hashtags