neo4j-contrib / neo4j-apoc-procedures

Awesome Procedures On Cypher for Neo4j - codenamed "apoc"                     If you like it, please ★ above ⇧            
https://neo4j.com/labs/apoc
Apache License 2.0
1.72k stars 493 forks source link

APOC NLP functions for Azure cognitive services - Failed - returned HTTP response code: 400 #1808

Closed albert-kevin closed 3 years ago

albert-kevin commented 3 years ago

Expected Behavior (Mandatory)

HTTP response: 200

Actual Behavior (Mandatory)

HTTP response: 400

How to Reproduce the Problem

Simple Dataset (where it's possibile)

documentation: https://neo4j.com/labs/apoc/4.2/nlp/azure/#nlp-azure-examples-entities currently using: neo4j 4.1.3-enterprise, APOC 4.1.0.6 and NLP 4.1.0.2 also tried with: neo4j 4.2.3-enterprise, APOC 4.2.0.1 and NLP 4.2.0.1 exactly same failure result, I have seen it work properly 4 months ago and used this code:

here some code to add test data (text for NLP entities extraction)

MERGE (:Article {
  uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/",
  body: "These days I'm rarely more than a few feet away from my Nintendo Switch and I play board games, card games and role playing games with friends at least once or twice a week. I've even organised lunch-time Mario Kart 8 tournaments between the Neo4j European offices!"
})
MERGE (:Article {
  uri: "https://en.wikipedia.org/wiki/Nintendo_Switch",
  body: "The Nintendo Switch is a video game console developed by Nintendo, released worldwide in most regions on March 3, 2017. It is a hybrid console that can be used as a home console and portable device. The Nintendo Switch was unveiled on October 20, 2016. Nintendo offers a Joy-Con Wheel, a small steering wheel-like unit that a Joy-Con can slot into, allowing it to be used for racing games such as Mario Kart 8."
})

here is the code to extract the entities (but fail)

MATCH (a:Article)
WITH collect(a) AS articles
CALL apoc.nlp.azure.entities.graph(articles, {
  key: "58abe70ff5f74497bf596cb5055f9683",
  url: "https://kevintest0702202001.cognitiveservices.azure.com",
  nodeProperty: "body",
  writeRelationshipType: "ENTITY",
  write: true
})
YIELD graph AS g
RETURN g

(Replaced it with the proper keys and url, feel free to try yourself if you know how ?)

included a link to this thread on Slack if you want to test more stuff or communicate ? https://neo4j-users.slack.com/archives/C5C4JRFK7/p1613053375003700

test machine

(only valid for a week or so) here is the server to test on with these versions: http://138.91.61.102:7474/browser/ username = neo4j password = digityser

on Forum I noticed this community subject: https://community.neo4j.com/t/apoc-nlp-functions-for-azure-cognitive-services-are-not-working/31619

I have seen it work before, don't know why all of a sudden it does not... Also notice, that the NLP code probably uses an old API version on Azure v2.1, we are currently on v3.0 - this could be updated. personally, I got a workaround by writing python code myself and then store in the database manually, but the APOC code is handy (if it works)

Specifications (Mandatory)

Currently used versions

Versions

Cores : 2 (2GHz) Memory: 7.78 GB (24.6%) Swap : 8 GB Disk : 145 GB (45.4% ext4) System: 18.04.1-Ubuntu

conda : 4.9.2 pip : 21.0.1 python: 3.8.6 py2neo: 4.2.0

dbms.default_listen_address=0.0.0.0

neo4j.bloom.license_file=/plugins/bloom-plugin.license neo4j.bloom.authorization_role=admin,architect dbms.unmanaged_extension_classes=com.neo4j.bloom.server=/browser/bloom dbms.tx_log.rotation.retention_policy=100M size dbms.security.procedures.whitelist=apoc.,gds. dbms.security.procedures.unrestricted=apoc.,gds.,bloom.* dbms.memory.pagecache.size=2G dbms.memory.heap.max_size=2G dbms.directories.plugins=/plugins dbms.directories.logs=/logs dbms.directories.import=/import causal_clustering.transaction_advertised_address=119fdd59aacc:6000 causal_clustering.raft_advertised_address=119fdd59aacc:7000 causal_clustering.discovery_advertised_address=119fdd59aacc:5000 apoc.import.file.enabled=true

total 58312 -rw-r--r-- 1 7474 7474 10848418 Aug 12 2020 apoc-nlp-dependencies-4.1.0.2.jar -rw-r--r-- 1 root root 18540089 Jan 26 14:35 apoc.jar -rw-r--r-- 1 7474 7474 11132061 Sep 30 17:49 bloom-plugin-4.x-1.4.0.jar -rw-r--r-- 1 7474 7474 84 Feb 11 14:26 bloom-plugin.license -rw-r--r-- 1 root root 9985742 Nov 5 16:55 graph-data-science.jar -rw-r--r-- 1 7474 7474 9192401 Sep 30 17:49 neo4j-bloom-1.4.0-assets.zip

Entities = does not work Key Phrases = does not work Sentiment = Works !

MATCH (a:Article {uri: "https://neo4j.com/blog/pokegraph-gotta-graph-em-all/"})
CALL apoc.nlp.azure.sentiment.stream(a, {
  key: "58abe70ff5f74497bf596cb5055f9683",
  url: "https://kevintest0702202001.cognitiveservices.azure.com",
  nodeProperty: "body"
})
YIELD value
RETURN value;

{ "score": 0.5, "id": "0" }

some extra information about the API on Azure for sentiment

HTTP request URL: https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.1/sentiment

POST https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.1/sentiment HTTP/1.1
Host: westeurope.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: 58abe70ff5f74497bf596cb5055f9683

{
  "documents": [
    {
      "language": "en",
      "id": "1",
      "text": "Hello world. This is some input text that I love."
    },
    {
      "language": "fr",
      "id": "2",
      "text": "Bonjour tout le monde"
    },
    {
      "language": "es",
      "id": "3",
      "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."
    }
  ]
}
Transfer-Encoding: chunked
csp-billing-usage: CognitiveServices.TextAnalytics.BatchScoring=3
x-envoy-upstream-service-time: 13
apim-request-id: ab705030-6bf8-4c94-b8c9-adbc01ce41ae
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Thu, 11 Feb 2021 15:30:41 GMT
Content-Type: application/json; charset=utf-8

{
  "documents": [{
    "id": "1",
    "score": 0.99634093046188354
  }, {
    "id": "2",
    "score": 0.84012651443481445
  }, {
    "id": "3",
    "score": 0.334433376789093
  }],
  "errors": []
}

some extra information about the API on Azure for entities:

HTTP request URL: https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.1/entities

POST https://westeurope.api.cognitive.microsoft.com/text/analytics/v2.1/entities HTTP/1.1
Host: westeurope.api.cognitive.microsoft.com
Content-Type: application/json
Ocp-Apim-Subscription-Key: 58abe70ff5f74497bf596cb5055f9683

{
  "documents": [
    {
      "language": "en",
      "id": "1",
      "text": "Hello world. This is some input text that I love."
    },
    {
      "language": "fr",
      "id": "2",
      "text": "Bonjour tout le monde"
    },
    {
      "language": "es",
      "id": "3",
      "text": "La carretera estaba atascada. Había mucho tráfico el día de ayer."
    }
  ]
}
Transfer-Encoding: chunked
csp-billing-usage: CognitiveServices.TextAnalytics.BatchScoring=3
x-envoy-upstream-service-time: 17
apim-request-id: fad4c1d6-81e7-4313-8cc3-c3afa94fd711
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
x-content-type-options: nosniff
Date: Thu, 11 Feb 2021 15:30:27 GMT
Content-Type: application/json; charset=utf-8

{
  "documents": [{
    "id": "1",
    "entities": [{
      "name": "\"Hello, World!\" program",
      "matches": [{
        "wikipediaScore": 0.2148458670347686,
        "text": "Hello world",
        "offset": 0,
        "length": 11
      }],
      "wikipediaLanguage": "en",
      "wikipediaId": "\"Hello, World!\" program",
      "wikipediaUrl": "https://en.wikipedia.org/wiki/\"Hello,_World!\"_program",
      "bingId": "7b4d3717-77ab-39ff-d8ec-1c7fd8723bd2",
      "type": "Other"
    }]
  }, {
    "id": "2",
    "entities": []
  }, {
    "id": "3",
    "entities": [{
      "name": "el día",
      "matches": [{
        "entityTypeScore": 0.8,
        "text": "el día",
        "offset": 50,
        "length": 6
      }],
      "type": "DateTime",
      "subType": "Date"
    }, {
      "name": "ayer",
      "matches": [{
        "entityTypeScore": 0.8,
        "text": "ayer",
        "offset": 60,
        "length": 4
      }],
      "type": "DateTime",
      "subType": "Date"
    }]
  }],
  "errors": []
}
albert-kevin commented 3 years ago

ping me anytime, I can start up the VM anytime you want to test and see for yourself

conker84 commented 3 years ago

@albert-kevin can you please try with the following jar? apoc-4.1.0.6-all.jar.zip

albert-kevin commented 3 years ago

I started the VM (138.91.61.102)

checked the neo4j/plugins folder -rw-r--r-- 1 7474 7474 18540089 Jan 26 14:35 apoc.jar sudo mv apoc.jar apoc_olderBkcp.jar

sudo wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/files/6042673/apoc-4.1.0.6-all.jar.zip sudo unzip apoc-4.1.0.6-all.jar.zip sudo chown 7474:7474 apoc-4.1.0.6-all.jar sudo rm apoc-4.1.0.6-all.jar.zip

file detail: -rw-r--r-- 1 7474 7474 18542753 Feb 25 12:38 apoc-4.1.0.6-all.jar

cd ../.. sudo docker-compose down sudo docker-compose up &

fail to start !

sudo mv apoc-4.1.0.6-all.jar apoc.jar sudo rm -Rf neo4j/data/databases/neo4j sudo rm -Rf neo4j/data/transactions/neo4j sudo docker-compose up --build &

it is unable to start and loops restarting...

this is not easy to change because I use docker-compose container so in there it downloads automatically the right version based on the neo4j version

removed the neo4j/conf/neo4j.conf

then it started up succesfully !

cd neo4j/plugins sudo wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/files/6042673/apoc-4.1.0.6-all.jar.zip sudo unzip apoc-4.1.0.6-all.jar.zip sudo rm apoc-4.1.0.6-all.jar.zip sudo mv apoc-4.1.0.6-all.jar apoc.jar

TEST ! failed, you may try yourself

albert-kevin commented 3 years ago

be aware I am using this in Ubuntu so, it downloads from neo4j own code the correct apoc versions automatic

version: '3' services: neo4j_db: restart: always # startup on reboot VM image: neo4j:4.1.3-enterprise # Enterprise Edition volumes:

albert-kevin commented 3 years ago

ubuntu@myVM02:~/notebooks/azuremachinelearning$ sudo docker-compose up --build & [1] 10386 ubuntu@myVM02:~/notebooks/azuremachinelearning$ Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/ Creating network "azuremachinelearning_default" with the default driver Pulling neo4j_db (neo4j:4.1.3-enterprise)... 4.1.3-enterprise: Pulling from library/neo4j 45b42c59be33: Pull complete a91c0c19c848: Pull complete dbe61a45ef18: Pull complete 532e5467f958: Pull complete e1a0ee7ccff9: Pull complete cc1b213a4271: Pull complete c6862d90bd9f: Pull complete f1997d1f157d: Pull complete Digest: sha256:2387721fb6e9933866dcc318647161a6bd1cf7f92734f996daa82bd0b0baf8f8 Status: Downloaded newer image for neo4j:4.1.3-enterprise Creating azuremachinelearning_neo4j_db_1 ... done Attaching to azuremachinelearning_neo4j_db_1 neo4j_db_1 | Warning: Some files inside "/data" are not writable from inside container. Changing folder owner to neo4j. neo4j_db_1 | grep: /var/lib/neo4j/conf/neo4j.conf: No such file or directory neo4j_db_1 | Changed password for user 'neo4j'. neo4j_db_1 | Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json neo4j_db_1 | Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.1.0.6/apoc-4.1.0.6-all.jar to /plugins/apoc.jar neo4j_db_1 | Applying default values for plugin apoc to neo4j.conf neo4j_db_1 | Skipping dbms.security.procedures.unrestricted for plugin apoc because it is already set neo4j_db_1 | Fetching versions.json for Plugin 'graph-data-science' from https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/versions.json neo4j_db_1 | Installing Plugin 'graph-data-science' from https://s3-eu-west-1.amazonaws.com/com.neo4j.graphalgorithms.dist/graph-data-science/neo4j-graph-data-science-1.4.0-standalone.jar to /plugins/graph-data-science.jar neo4j_db_1 | Applying default values for plugin graph-data-science to neo4j.conf neo4j_db_1 | Skipping dbms.security.procedures.unrestricted for plugin graph-data-science because it is already set neo4j_db_1 | Directories in use: neo4j_db_1 | home: /var/lib/neo4j neo4j_db_1 | config: /var/lib/neo4j/conf neo4j_db_1 | logs: /logs neo4j_db_1 | plugins: /plugins neo4j_db_1 | import: /import neo4j_db_1 | data: /var/lib/neo4j/data neo4j_db_1 | certificates: /var/lib/neo4j/certificates neo4j_db_1 | run: /var/lib/neo4j/run neo4j_db_1 | Starting Neo4j. neo4j_db_1 | 2021-02-25 13:54:32.629+0000 INFO Starting... neo4j_db_1 | 2021-02-25 13:54:36.746+0000 INFO ======== Neo4j 4.1.3 ======== neo4j_db_1 | 2021-02-25 13:54:42.591+0000 INFO Called db.clearQueryCaches(): Query cache already empty. neo4j_db_1 | 2021-02-25 13:54:58.536+0000 INFO Sending metrics to CSV file at /var/lib/neo4j/metrics neo4j_db_1 | 2021-02-25 13:54:58.580+0000 INFO Bolt enabled on 0.0.0.0:7687. neo4j_db_1 | 2021-02-25 13:54:58.650+0000 INFO Mounted unmanaged extension [com.neo4j.bloom.server] at [/browser/bloom] neo4j_db_1 | 2021-02-25 13:55:00.266+0000 WARN The following warnings have been detected: WARNING: The (sub)resource method file in com.neo4j.bloom.server.BloomResource contains empty path annotation. neo4j_db_1 | neo4j_db_1 | 2021-02-25 13:55:00.510+0000 INFO Remote interface available at http://localhost:7474/ neo4j_db_1 | 2021-02-25 13:55:00.512+0000 INFO Started.

conker84 commented 3 years ago

Try with this:

version: '3'
services:
neo4j_db:
restart: always # startup on reboot VM
image: neo4j:4.1.3-enterprise # Enterprise Edition
volumes:
- /home/ubuntu/notebooks/azuremachinelearning/neo4j/plugins:/plugins
- /home/ubuntu/notebooks/azuremachinelearning/neo4j/logs:/logs
- /home/ubuntu/notebooks/azuremachinelearning/neo4j/data:/var/lib/neo4j/data
- /home/ubuntu/notebooks/azuremachinelearning/neo4j/conf:/var/lib/neo4j/conf
- /home/ubuntu/notebooks/azuremachinelearning/neo4j/import:/import
ports:
- 7474:7474 # http
- 7473:7473 # https
- 7687:7687 # Bolt
environment:
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_AUTH=neo4j/digityser
- NEO4J_dbms_memory_pagecache_size=2G # default 512MB
- NEO4J_dbms_memory_heap_max__size=2G # default 512MB
- NEO4J_dbms_security_procedures_unrestricted=apoc.,gds.,bloom.*
- NEO4J_dbms_security_procedures_whitelist=apoc.,gds.
- NEO4J_dbms_unmanaged__extension__classes=com.neo4j.bloom.server=/browser/bloom
- NEO4J_neo4j_bloom_license__file=/plugins/bloom-plugin.license
- NEO4J_neo4j_bloom_authorization__role=admin,architect
- NEO4J_apoc_import_file_enabled=true

And be sure that in /home/ubuntu/notebooks/azuremachinelearning/neo4j/plugins you have the jar file that I shared with you (unzipped)

then from the compose file directory: docker-compose up

You're are downloading the official release of APOC which does not contain the fix.

Just to let you know I tested the fix before sharing it with you :)

albert-kevin commented 3 years ago

I will try to replace the NLP version: -rw-r--r-- 1 7474 7474 10848418 Aug 12 2020 apoc-nlp-dependencies-4.1.0.2.jar

cd neo4j/plugins sudo wget https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.1.0.6/apoc-nlp-dependencies-4.1.0.6.jar sudo chown 7474:7474 apoc-nlp-dependencies-4.1.0.6.jar sudo rm apoc-nlp-dependencies-4.1.0.2.jar

-rw-r--r-- 1 7474 7474 10848418 Jan 26 14:35 apoc-nlp-dependencies-4.1.0.6.jar

cd ../.. sudo docker-compose down sudo docker-compose up &

TEST: FAIL

will try with NLP 4.2.0.1 and latest Neo4j 4.2.3-enterprise

also fails when using latest versions ?! https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.2.0.1/apoc-nlp-dependencies-4.2.0.1.jar

albert-kevin commented 3 years ago

testing your last message now...

conker84 commented 3 years ago

Please be sure to also have apoc-nlp-dependencies-4.1.0.2.jar inside the /home/ubuntu/notebooks/azuremachinelearning/neo4j/plugins

albert-kevin commented 3 years ago

-rwxrwxrwx 1 7474 7474 18542753 Feb 25 15:52 apoc-4.1.0.6-all.jar -rwxrwxrwx 1 7474 7474 10848418 Feb 25 16:00 apoc-nlp-dependencies-4.1.0.2.jar -rwxrwxrwx 1 7474 7474 11132061 Sep 30 17:49 bloom-plugin-4.x-1.4.0.jar -rwxrwxrwx 1 7474 7474 84 Feb 11 14:26 bloom-plugin.license -rwxrwxrwx 1 7474 7474 14522303 Feb 9 23:34 graph-data-science.jar -rwxrwxrwx 1 7474 7474 9192401 Sep 30 17:49 neo4j-bloom-1.4.0-assets.zip

changed docker-compose: version: '3' services: neo4j_db: restart: always # startup on reboot VM image: neo4j:4.1.3-enterprise # Enterprise Edition volumes:

failed to start up, it has to do with log files that needs to be cleared... try again sudo rm neo4j/logs/*.log sudo rm -Rf neo4j/data/databases/neo4j sudo rm -Rf neo4j/data/transactions/neo4j

sudo docker-compose down --rmi all sudo docker-compose up &

I fail to start Neo4j ubuntu@myVM02:~/notebooks/azuremachinelearning/neo4j$ sudo docker-compose up & [1] 7637 ubuntu@myVM02:~/notebooks/azuremachinelearning/neo4j$ Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/ Creating network "azuremachinelearning_default" with the default driver Pulling neo4j_db (neo4j:4.1.3-enterprise)... 4.1.3-enterprise: Pulling from library/neo4j 45b42c59be33: Pull complete a91c0c19c848: Pull complete dbe61a45ef18: Pull complete 532e5467f958: Pull complete e1a0ee7ccff9: Pull complete cc1b213a4271: Pull complete c6862d90bd9f: Pull complete f1997d1f157d: Pull complete Digest: sha256:2387721fb6e9933866dcc318647161a6bd1cf7f92734f996daa82bd0b0baf8f8 Status: Downloaded newer image for neo4j:4.1.3-enterprise Creating azuremachinelearning_neo4j_db_1 ... done Attaching to azuremachinelearning_neo4j_db_1 neo4j_db_1 | Warning: Some files inside "/data" are not writable from inside container. Changing folder owner to neo4j. neo4j_db_1 | Changed password for user 'neo4j'. neo4j_db_1 | Directories in use: neo4j_db_1 | home: /var/lib/neo4j neo4j_db_1 | config: /var/lib/neo4j/conf neo4j_db_1 | logs: /logs neo4j_db_1 | plugins: /plugins neo4j_db_1 | import: /import neo4j_db_1 | data: /var/lib/neo4j/data neo4j_db_1 | certificates: /var/lib/neo4j/certificates neo4j_db_1 | run: /var/lib/neo4j/run neo4j_db_1 | Starting Neo4j. neo4j_db_1 | 2021-02-25 15:11:17.042+0000 INFO Starting... neo4j_db_1 | 2021-02-25 15:11:24.829+0000 INFO ======== Neo4j 4.1.3 ======== neo4j_db_1 | 2021-02-25 15:11:32.628+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address. Error starting Neo4j database server at /data/databases neo4j_db_1 | java.lang.RuntimeException: Error starting Neo4j database server at /data/databases neo4j_db_1 | at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:198) neo4j_db_1 | at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:158) neo4j_db_1 | at com.neo4j.server.enterprise.EnterpriseManagementServiceFactory.createManagementService(EnterpriseManagementServiceFactory.java:38) neo4j_db_1 | at com.neo4j.server.enterprise.EnterpriseBootstrapper.createNeo(EnterpriseBootstrapper.java:20) neo4j_db_1 | at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:117) neo4j_db_1 | at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:87) neo4j_db_1 | at com.neo4j.server.enterprise.EnterpriseEntryPoint.main(EnterpriseEntryPoint.java:25) neo4j_db_1 | Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'com.neo4j.dbms.StandaloneDbmsReconcilerModule@497ed877' was successfully initialized, but failed to start. Please see the attached cause exception "Transaction logs contains entries with prefix 2, and the highest supported prefix is 1. This indicates that the log files originates from a newer version of neo4j.". neo4j_db_1 | at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:463) neo4j_db_1 | at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:110) neo4j_db_1 | at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:189) neo4j_db_1 | ... 6 more neo4j_db_1 | Caused by: org.neo4j.dbms.api.DatabaseManagementException: A triggered DbmsReconciler job failed with the following cause neo4j_db_1 | at com.neo4j.dbms.ReconcilerResult.join(ReconcilerResult.java:57) neo4j_db_1 | at com.neo4j.dbms.StandaloneDbmsReconcilerModule.startInitialDatabases(StandaloneDbmsReconcilerModule.java:95) neo4j_db_1 | at com.neo4j.dbms.StandaloneDbmsReconcilerModule.start(StandaloneDbmsReconcilerModule.java:85) neo4j_db_1 | at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:442) neo4j_db_1 | ... 8 more neo4j_db_1 | Caused by: org.neo4j.dbms.api.DatabaseManagementException: An error occurred! Unable to start database with name system. neo4j_db_1 | at org.neo4j.dbms.database.AbstractDatabaseManager.startDatabase(AbstractDatabaseManager.java:191) neo4j_db_1 | at com.neo4j.dbms.database.MultiDatabaseManager.forSingleDatabase(MultiDatabaseManager.java:134) neo4j_db_1 | at com.neo4j.dbms.database.MultiDatabaseManager.startDatabase(MultiDatabaseManager.java:119) neo4j_db_1 | at com.neo4j.dbms.Transition$Prepared.doTransitionAction(Transition.java:101) neo4j_db_1 | at com.neo4j.dbms.Transition$Prepared.doTransition(Transition.java:88) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:347) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:348) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.doTransitionStep(DbmsReconciler.java:348) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.lambda$doTransitions$11(DbmsReconciler.java:316) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.namedJob(DbmsReconciler.java:327) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.doTransitions(DbmsReconciler.java:317) neo4j_db_1 | at com.neo4j.dbms.DbmsReconciler.lambda$doTransitions$9(DbmsReconciler.java:308) neo4j_db_1 | at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) neo4j_db_1 | at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) neo4j_db_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) neo4j_db_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) neo4j_db_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) neo4j_db_1 | at java.base/java.lang.Thread.run(Thread.java:834) neo4j_db_1 | Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Error reading transaction logs, recovery not possible. To force the database to start anyway, you can specify 'unsupported.dbms.tx_log.fail_on_corrupted_log_files=false'. This will try to recover as much as possible and then truncate the corrupt part of the transaction log. Doing this means your database integrity might be compromised, please consider restoring from a consistent backup instead. neo4j_db_1 | at org.neo4j.kernel.database.Database.start(Database.java:496) neo4j_db_1 | at org.neo4j.dbms.database.AbstractDatabaseManager.startDatabase(AbstractDatabaseManager.java:187) neo4j_db_1 | ... 17 more neo4j_db_1 | Caused by: java.lang.RuntimeException: Error reading transaction logs, recovery not possible. To force the database to start anyway, you can specify 'unsupported.dbms.tx_log.fail_on_corrupted_log_files=false'. This will try to recover as much as possible and then truncate the corrupt part of the transaction log. Doing this means your database integrity might be compromised, please consider restoring from a consistent backup instead. neo4j_db_1 | at org.neo4j.kernel.recovery.Recovery.throwUnableToCleanRecover(Recovery.java:480) neo4j_db_1 | at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:168) neo4j_db_1 | at org.neo4j.kernel.recovery.LogTailScanner.getTailInformation(LogTailScanner.java:342) neo4j_db_1 | at org.neo4j.kernel.recovery.Recovery.validateStoreId(Recovery.java:401) neo4j_db_1 | at org.neo4j.kernel.database.Database.checkStoreId(Database.java:534) neo4j_db_1 | at org.neo4j.kernel.database.Database.validateStoreAndTxLogs(Database.java:513) neo4j_db_1 | at org.neo4j.kernel.database.Database.start(Database.java:386) neo4j_db_1 | ... 18 more neo4j_db_1 | Caused by: org.neo4j.kernel.impl.transaction.log.entry.UnsupportedLogVersionException: Transaction logs contains entries with prefix 2, and the highest supported prefix is 1. This indicates that the log files originates from a newer version of neo4j. neo4j_db_1 | at org.neo4j.kernel.impl.transaction.log.entry.LogEntryVersion.select(LogEntryVersion.java:93) neo4j_db_1 | at org.neo4j.kernel.impl.transaction.log.entry.VersionAwareLogEntryReader.readLogEntry(VersionAwareLogEntryReader.java:89) neo4j_db_1 | at org.neo4j.kernel.impl.transaction.log.LogEntryCursor.next(LogEntryCursor.java:53) neo4j_db_1 | at org.neo4j.kernel.recovery.LogTailScanner.findLogTail(LogTailScanner.java:123) neo4j_db_1 | ... 23 more neo4j_db_1 | 2021-02-25 15:11:32.654+0000 INFO Neo4j Server shutdown initiated by request

albert-kevin commented 3 years ago

ok cleared the whole data folder, now neo4j started up, testing now

albert-kevin commented 3 years ago

nothing changed, we still have these files and exact size/versions: -rwxrwxrwx 1 7474 7474 18542753 Feb 25 15:52 apoc-4.1.0.6-all.jar -rwxrwxrwx 1 7474 7474 10848418 Feb 25 16:00 apoc-nlp-dependencies-4.1.0.2.jar -rwxrwxrwx 1 7474 7474 11132061 Sep 30 17:49 bloom-plugin-4.x-1.4.0.jar -rwxrwxrwx 1 7474 7474 84 Feb 11 14:26 bloom-plugin.license -rwxrwxrwx 1 7474 7474 14522303 Feb 9 23:34 graph-data-science.jar -rwxrwxrwx 1 7474 7474 9192401 Sep 30 17:49 neo4j-bloom-1.4.0-assets.zip

ubuntu@myVM02:~/notebooks/azuremachinelearning$ sudo docker-compose up & [1] 18226 ubuntu@myVM02:~/notebooks/azuremachinelearning$ Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/ Creating network "azuremachinelearning_default" with the default driver Creating azuremachinelearning_neo4j_db_1 ... done Attaching to azuremachinelearning_neo4j_db_1 neo4j_db_1 | Warning: Folder mounted to "/data" is not writable from inside container. Changing folder owner to neo4j. neo4j_db_1 | Changed password for user 'neo4j'. neo4j_db_1 | Directories in use: neo4j_db_1 | home: /var/lib/neo4j neo4j_db_1 | config: /var/lib/neo4j/conf neo4j_db_1 | logs: /logs neo4j_db_1 | plugins: /plugins neo4j_db_1 | import: /import neo4j_db_1 | data: /var/lib/neo4j/data neo4j_db_1 | certificates: /var/lib/neo4j/certificates neo4j_db_1 | run: /var/lib/neo4j/run neo4j_db_1 | Starting Neo4j. neo4j_db_1 | 2021-02-25 15:18:18.437+0000 INFO Starting... neo4j_db_1 | 2021-02-25 15:18:23.004+0000 INFO ======== Neo4j 4.1.3 ======== neo4j_db_1 | 2021-02-25 15:18:26.043+0000 INFO [system] disabled neo4j_db_1 | 2021-02-25 15:18:43.931+0000 INFO [neo4j] disabled neo4j_db_1 | 2021-02-25 15:18:47.569+0000 INFO Called db.clearQueryCaches(): Query cache already empty. neo4j_db_1 | 2021-02-25 15:18:48.214+0000 INFO Sending metrics to CSV file at /var/lib/neo4j/metrics neo4j_db_1 | 2021-02-25 15:18:48.263+0000 INFO Bolt enabled on 0.0.0.0:7687. neo4j_db_1 | 2021-02-25 15:18:48.344+0000 INFO Mounted unmanaged extension [com.neo4j.bloom.server] at [/browser/bloom] neo4j_db_1 | 2021-02-25 15:18:50.151+0000 WARN The following warnings have been detected: WARNING: The (sub)resource method file in com.neo4j.bloom.server.BloomResource contains empty path annotation. neo4j_db_1 | neo4j_db_1 | 2021-02-25 15:18:50.440+0000 INFO Remote interface available at http://localhost:7474/ neo4j_db_1 | 2021-02-25 15:18:50.442+0000 INFO Started.

TESTING Cypher code: There is no procedure with the name apoc.nlp.azure.entities.graph registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

conker84 commented 3 years ago

I think would be great if you remove all containers and try with a new one

conker84 commented 3 years ago

and try with fresh folders that are visible from docker!

albert-kevin commented 3 years ago

ok IT WORKS now I had to change the docker-compose file a bit more: needed to set '*' asteriks

      - NEO4J_dbms_security_procedures_unrestricted=apoc.*,gds.*,bloom.*
      - NEO4J_dbms_security_procedures_whitelist=apoc.*,gds.*

now indeed with these specific version it is able to work properly. the question now is can we use this understanding and do more tests...

conker84 commented 3 years ago

what do you mean?

albert-kevin commented 3 years ago

ok it also works with the latest NLP 4.1.0.6 in this neo4j 4.1.3 and same apoc 4.1.0.6 when I restart and remove the NLP jar it fails having no procedures. When I restart and install the an NLP version not compatible... v4.2.0.1 then it works fine ! I don't yet understand why the failures ?

albert-kevin commented 3 years ago

ok, when I use APOC 4.2.0.1 with incompatible neo4j 4.1.3 and latest NLP 4.2.0.1 it fails due to APOC.

conker84 commented 3 years ago

Yes it's incompatible

albert-kevin commented 3 years ago

ok, I now tried again to use neo4j 4.2.3 with NLP 4.2.0.1 and APOC 4.2.0.1 it FAIL as well. ok!

albert-kevin commented 3 years ago

do you agree that, although newer version are released, this Azure NLP APOC stuff is clearly not functioning since ? How do we go further and is there a way I can help ? Testing or even see the code and try new things, because notice that the APOC code should allow a parameter to choose the API version, currently v2.1 is less transformative AI model than the latest v3.0 (maybe even v3.1)

conker84 commented 3 years ago

Sorry I don't understand what you mean. Btw API v3 is currently in preview and we plan to support it as it will become stable.

albert-kevin commented 3 years ago

ok so when I use Neo4j v4.2.3 (instead of 4.1.3) then apoc.nlp.azure.entities.graph generate a fault, that did not occur in the older version. Why allow to release a newer neo4j, nlp and apoc version with broken features is unclear to me ? The error message in Cypher: java.io.IOException: Server returned HTTP response code: 400 for URL: https://kevintest0702202001.cognitiveservices.azure.com/text/analytics/**v2.1/entities** <-- Notice the v2.1/entities

Note: apoc.nlp.azure.sentiment.stream does work in these latest versions, so something is broken

Text Analytics API (v2.0) Text Analytics API (v2.1) Text Analytics API (v3.0) MS is working on newer version in Review v3.1 (ignoring this for now...)

I see you are currently using v2.1 and has v3.0 available for I expect better results I recommend to allow the apoc command to customizing the API version to use perhaps. I also would suggest the possibility to limit the amount of calls, although you should do that with Cypher LIMIT

Now, if you go look within the API you will see 4 main and even a fifth in v3 API http POST functions:

within your documentation you only allow for these three: https://neo4j.com/labs/apoc/4.2/nlp/azure/

just for reference I will include below an overview for v3.0 POST functions: https://{endpoint}/text/analytics/v3.0/languages https://{endpoint}/text/analytics/v3.0/keyPhrases https://{endpoint}/text/analytics/v3.0/entities/linking https://{endpoint}/text/analytics/v3.0/entities/recognition/general https://{endpoint}/text/analytics/v3.0/sentiment

conker84 commented 3 years ago

Ok we're making a little bit of confusion on this thread, lets keep it simple:

Screenshot 2021-02-26 at 14 32 38

The image is in Italian but "anteprima" means "preview". Thank you so much for spotting the error, once the PR that fix the problem will be merged we can prepare a new release for 4.1 and 4.2 version so you can use whatever version you want until then you can use jar that I shared with Neo4j 4.1.

albert-kevin commented 3 years ago

_again, you cannot use the package that I shared with you in Neo4j 4.2 because its compiled for Neo4j 4.1. When you use Docker the Neo4j image will download the proper apoc version when you use this env variable NEO4JLABSPLUGINS=["apoc","graph-data-science"] Yes, I agree and that is correct. depending on the neo4j version it will automatically select from the json file the compatible apoc and graph-data-science jar files and download

I tried the jar that I sent to you with all 4 endpoints and it works properly (in 4.1) do we agree on this? Yes, I agree this 4.1 version works well.

In the Azure console I can see only two apis V2 and V3 as preview, can you share a link that says V3.0 is not a preview? yes, here is the link If you scroll below you see the all the 'real' available API versions, for example search for "Text Analytics API" V3.0 is no longer in Preview, that is a mistake in Azure Dashboard to see your screenshot in "quick start".

I'm curious why you're using two Neo4j versions? No no, actually want to work with the best version of Neo4j when I study, explore and experiment for data projects. I simply assumed that installing the latest version would enable a more stable, bug free and feature rich environment. For me every time I start a little project for a few days I simply update the Neo4j version numbering in my build script to the latest available I find on Docker hub Neo4j. Now I know this can mean that it is not bug free but can mean certain apoc feature no longer work like we have seen.

I appreciate your time and your feedback, I don't want to underestimate all the work that is done and it's complexity ! Feel free to ask more questions and if I need to provide more tests for you, let me know. I hope my answers satisfy your questions.