Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
I need to extract document and metadata from alfresco repository. I have tried using apache manifoldcf and connected alfresco cmis to solr (opensemanticsearch). But I want to connect the output from alfresco cmis to the opensemanticsearch ETL as part of pipeline, so that the alfresco content can be enhanced and at the same time, metadata of alfresco documents are also indexed in opensemanticsearch's solr.
Any one has any suggestions on how to configure the ETL pipeline for the above requirement. I have gone through the documentations but still could not figure out the best way to achieve it.
Hi,
I need to extract document and metadata from alfresco repository. I have tried using apache manifoldcf and connected alfresco cmis to solr (opensemanticsearch). But I want to connect the output from alfresco cmis to the opensemanticsearch ETL as part of pipeline, so that the alfresco content can be enhanced and at the same time, metadata of alfresco documents are also indexed in opensemanticsearch's solr.
Any one has any suggestions on how to configure the ETL pipeline for the above requirement. I have gone through the documentations but still could not figure out the best way to achieve it.
Thanks & Regards Radakichenin