opensemanticsearch / open-semantic-etl

Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
https://opensemanticsearch.org/etl
GNU General Public License v3.0
254 stars 69 forks source link

no data export to Neo4J #76

Closed ronaldoviber closed 5 years ago

ronaldoviber commented 5 years ago

on a fresh debian stretch: there is no export of entities to neo4j.

Version: open-semantic-search_18.12.23

already installed: py2neo + export_neo4j, and activated by config!

no logs, something missing?

ptmaroct commented 5 years ago

Had the same issue. I think it is due to some broken code in export_neo4j.py plugin which is responsible for sending data to neo4j. There was a minor fix that I found, which was to manually update the code in python file so that it used commit for transaction in neo4j and I was partially successful in populating neo4j database. Check: issue #70

Mandalka commented 5 years ago

Maybe there were changes in py2neo which evolved last year, if using newer releases than when i implemented the plugin.

I'll check / merge / test / upgrade to newest Neo4j and py2neo releases while migrating the plugin to new named entity API strucutures (which then in neo4j too will use the entity URIs/IDs like SKOS IDs from entity linking for then possible disambiguation instead of only labels in Neo4j) in january.

ptmaroct commented 5 years ago

Amazing! Looking forward to new release and thank you for the software! Are you free for some 10 minutes? I would like to talk to you @Mandalka

Mandalka commented 5 years ago

@ptmaroct i'll have some free time next week, since yet some stuff to document/release this week.

ptmaroct commented 5 years ago

Meanwhile, can you tell where the rdf files are stored of websites indexed. The documentation isn't clear about the steps to export as rdf link to docs This is an urgent task so any leads on this will be highly appreciated. I am willing to update the docs as well for helping new users.

opensemanticsearch commented 5 years ago

Merged https://github.com/opensemanticsearch/open-semantic-etl/pull/77 until upgrade to newest Neo4j and Py2Neo.

Mandalka commented 5 years ago

I upgraded to newest Neo4j and the new py2neo 4 seems to work with the py2neo lib instead of underlaying Neo4j Driver for Python, so i changed the pip3 install to use py2neo 4 again.

Maybe for upgrading you need to pip3 uninstall py2neo first if yet pinned to version 3.x

Maybe the DB format changed, so Neo4j db has to be deleted like described in #70.

Please reopen (or i will if further tests on fresh debian will cause problems) if yet problems after upgrade to Neo4j 3.5.2 and Py2neo 4.x and newest Open Semantic ETL.

Mandalka commented 5 years ago

Since doesn't work on fresh Debian pinned again to py2neo 3.x until upgrade when migrate to new ID/URI instead string based entities in Neo4j.