opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
257 stars 70 forks source link

Enhanced error handling for plugins #1

Open opensemanticsearch opened 8 years ago

opensemanticsearch commented 8 years ago

Implement enhanced error handling (fallback plugins and retry) for data enrichment or data analysis plugins:

There should be parameters for each extraction & analysis plugin in the process chain for retry and fallback to alternate plugins using alternate tools or methods.

F.e. despite Apache Tika can not parse a file, the Linux command "file" can find out the content type.

opensemanticsearch commented 8 years ago

Part done: The ETL tools will print not only HTTP error code but the full error message from Solr if something went wrong while posting data to Solr index for easier debugging of schema or errors.

Mandalka commented 5 years ago

ETL plugins using microservices / REST-APIs will retry failed connections: https://github.com/opensemanticsearch/open-semantic-etl/issues/84

Mandalka commented 5 years ago

Error status / message management in own function in etl.py.

Mandalka commented 5 years ago

Entity extraction by Solr text tagger(s) now with separated error handling for each tagger using this new error_message function, so status & error messages are indexed.

Mandalka commented 5 years ago

All ETL plugins, which use microservices / HTTP REST-APIs for analysis now waiting for services that are down/not loaded yet by enhanced HTTP exception handling, which additionally provides more detailed error messages.