Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Unittest fails because it can not delete the indexed document after the test:
======================================================================
ERROR: test_warc (test_enhance_warc.Test_enhance_warc)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/opensemanticetl/test_enhance_warc.py", line 31, in test_warc
etl_delete.delete(contained_doc_id)
File "/usr/lib/python3/dist-packages/opensemanticetl/etl_delete.py", line 60, in delete
self.connector.delete(parameters=self.config, docid=uri)
File "/usr/lib/python3/dist-packages/opensemanticetl/export_solr.py", line 351, in delete
result = solr.delete(id=docid)
File "/usr/lib/python3/dist-packages/pysolr.py", line 960, in delete
return self._update(m, commit=commit, softCommit=softCommit, waitFlush=waitFlush, waitSearcher=waitSearcher, handler=handler)
File "/usr/lib/python3/dist-packages/pysolr.py", line 500, in _update
return self._send_request('post', path, message, {'Content-type': 'text/xml; charset=utf-8'})
File "/usr/lib/python3/dist-packages/pysolr.py", line 412, in _send_request
raise SolrError(error_message % (resp.status_code, solr_message))
pysolr.SolrError: Solr responded with an error (HTTP 400): [Reason: Unexpected character ':' (code 58) excepted space, or '>' or "/>"
at [row,col {unknown-source}]: [1,41]]
Unittest fails because it can not delete the indexed document after the test:
Reason: https://github.com/django-haystack/pysolr/issues/368
Seems we have to wait for new release in python repo: https://github.com/django-haystack/pysolr/issues/373