opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

Adding apache manifoldcf to etl #115

Open kichenin opened 4 years ago

kichenin commented 4 years ago

Hi,

I need to extract document and metadata from alfresco repository. I have tried using apache manifoldcf and connected alfresco cmis to solr (opensemanticsearch). But I want to connect the output from alfresco cmis to the opensemanticsearch ETL as part of pipeline, so that the alfresco content can be enhanced and at the same time, metadata of alfresco documents are also indexed in opensemanticsearch's solr.

Any one has any suggestions on how to configure the ETL pipeline for the above requirement. I have gone through the documentations but still could not figure out the best way to achieve it.

Thanks & Regards Radakichenin