philipmeadows / alfresco-webscript-manifold-connector

Alfresco Solr API Repository Connector for Apache ManifoldCF
11 stars 11 forks source link

Text extracting #21

Closed alexist closed 9 years ago

alexist commented 9 years ago

Hi,

ManifoldCF use extract update handler to handle binary content. Binary content is sent to solr, and tikka try to extract text content and some metadata (mime type).

For alfresco connector, Alfresco should be used to convert binary to text as official solr do (by calling NodeContentGet). Because alfresco already know how to convert document to text.

But NodeContentGet webscript is protected by Certificat, you have to clone this webscript.

maoo commented 9 years ago

Created issue https://github.com/maoo/alfresco-indexer/issues/1