syedhassaanahmed / neo4j-datasets

Deploy publicly available Neo4j datasets on Azure Container Instances and optionally migrate them to Azure Cosmos DB
MIT License
7 stars 3 forks source link

Can not open Neo4J browser or connect to Neo4J remotely via BOLT protocol after recent Neo4J v4 update #1

Closed sesmyrnov closed 3 years ago

sesmyrnov commented 3 years ago

first - I want to thank you for your work you did here summarizing different Neo4J datasets and bootstraping Cosmos migrations. This is great work and really helpful. I did found the issue though after your recent update where it seems that both Neo4J browser and also BOLT connectivity both not working anymore after deploying those sample sets to container when I tried to use neo_to_cosmos library . HTTP/HTTPS browser connections are timing out and BOLT connection showing transport error: Unhandled Exception: Neo4j.Driver.V1.ServiceUnavailableException: Connection with the server breaks due to SecurityException: Failed to establish encrypted connection with server bolt://52.167.184.189:7687/. ---> Neo4j.Driver.V1.SecurityException: Failed to establish encrypted connection with server bolt://52.167.184.189:7687/. ---> System.IO.IOException: Authentication failed because the remote party has closed the transport stream.

syedhassaanahmed commented 3 years ago

Many thanks @sesmyrnov for opening the issue. Apologies for the late response. At first glance it seems like neo-to-cosmos is unable to reach the dataset container on a public IP (52.167.184.189 from your example). I'm trying to repro the issue and need some help clarify a few items;

sesmyrnov commented 3 years ago

1- If I deploy just a dataset to Container Service (without starting neo-to-cosmos) - I can see that it deployed successfully - but after that I fail to connect to it : from container log: Changed password for user 'neo4j'.

Starting Neo4j. 2021-01-11 16:49:00.182+0000 WARN Unrecognized setting. No declared setting with name: dbms.memory.heap.maxSize 2021-01-11 16:49:00.188+0000 INFO Starting... 2021-01-11 16:49:02.855+0000 INFO ======== Neo4j 4.1.3 ======== 2021-01-11 16:49:07.581+0000 INFO Initializing system graph model for component 'security-users' with version -1 and status UNINITIALIZED 2021-01-11 16:49:07.584+0000 INFO Setting up initial user from auth.ini file: neo4j 2021-01-11 16:49:07.584+0000 INFO Creating new user 'neo4j' (passwordChangeRequired=false, suspended=false) 2021-01-11 16:49:07.646+0000 INFO Setting version for 'security-users' to 2 2021-01-11 16:49:07.665+0000 INFO After initialization of system graph model component 'security-users' have version 2 and status CURRENT 2021-01-11 16:49:07.695+0000 INFO Performing postInitialization step for component 'security-users' with version 2 and status CURRENT 2021-01-11 16:49:19.190+0000 INFO Bolt enabled on 0.0.0.0:7687. 2021-01-11 16:49:20.412+0000 INFO Remote interface available at http://localhost:7474/ 2021-01-11 16:49:20.413+0000 INFO Started.

-- but https://40.118.242.86:7474 is still timing out.

2- I have not tried to deploying same dataset to local Docker - but I do have my own latest 4.x local neo4j docker instance for which browser is working on http://localhost:7474/browser/ ..

3 - for testing BOLT connection I basically define powershell ENV variables (like $env:COSMOSDB_ENDPOINT, $env:COSMOSDB_AUTHKEY and rest) and then run the dotnet NeoToCosmos.dll

syedhassaanahmed commented 3 years ago

Thanks for the details @sesmyrnov A small clarification - in Neo4J default HTTPS port is 7473 and default HTTP port is 7474.

There were 2 changes in Neo4j 4.0 due to which connectivity broke in Azure Container Instances. a) Some configuration environment variables have been renamed e.g. NEO4J_dbms_memory_heap_max__size instead of the old name NEO4J_dbms_memory_heap_maxSize b) Enabling HTTPS now requires more configuration. Previously it worked by simply enabling the Neo4j HTTPS port (7473).

My latest commit fixes above 2 issues. I've removed HTTPS for now until I figure out how to inject a valid TLS certificate in Azure Container Instances. Using the latest ARM template, you should now be able to browse the data on http://:7474

sesmyrnov commented 3 years ago

thank you. I confirmed that it is working now. But BOLT connection still not working/depending on those additional SSL/cert configs you mentioned (as I believe it is require SSL by default)..

[18:27:40 INF] {"ShouldRestart": false, "TotalInstances": 1, "InstanceId": 0, "PageSize": 1000, "LogLevel": "Information", "$type": "CommandLineOptions"} [18:27:40 INF] bolt://52.252.30.101:7687 [18:27:41 INF] MATCH (n) RETURN COUNT(n)

Unhandled Exception: Neo4j.Driver.V1.ServiceUnavailableException: Connection with the server breaks due to SecurityException: Failed to establish encrypted connection with server bolt://52.252.30.101:7687/. ---> Neo4j.Driver.V1.SecurityException: Failed to establish encrypted connection with server bolt://52.252.30.101:7687/. ---> System.IO.IOException: Authentication failed because the remote party has closed the transport stream. at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest) at System.Net.Security.SslState.PartialFrameCallback(AsyncProtocolRequest asyncRequest)

FYI - on my local Neo4J docker instance I was able to get around SSL for HTTPS/BOLT connections with following env vars at least for initial BOLT connection (I did not went down to configuring SSL certs stuff yet as it looks quite complex): NEO4J_dbms_ssl_policy_bolt_enabled="true" NEO4J_dbms_connector_bolt_advertisedaddress="localhost:7687" NEO4J_dbms_connector_http_advertisedaddress="localhost:7474" NEO4J_dbms_connector_https_advertised__address="localhost:7473"

see if this will work here as well without complexity of settings certs/folders .

syedhassaanahmed commented 3 years ago

@sesmyrnov I was able to get the bolt working without any changes, here are the E2E steps I followed, let's compare it with yours to see if I didn't miss something.

Finally I just did dotnet NeoToCosmos.dll and the migration to Cosmos DB started.

sesmyrnov commented 3 years ago

Ah, I completely missed the fact that in v4.x bolt:// was replaced by neo4j:// in URI.. Thanks for pointing this out. This now works for me as well for Game of Thrones dataset. I also tried it for other dataset "neo4j-retail-recommendations" (this is the one I'm actually interested in ) and while I see a successful connection and count of nodes/relationships - I see that it is exiting right away with no error after successfull Cosmos connection (see full log below):

[10:33:22 INF] {"ShouldRestart": false, "TotalInstances": 1, "InstanceId": 0, "PageSize": 1000, "LogLevel": "Information", "$type": "CommandLineOptions"}
[10:33:22 INF] neo4j://52.167.9.75:7687
[10:33:23 INF] MATCH (n) RETURN COUNT(n)
[10:33:23 INF] MATCH ()-[r]->() RETURN COUNT(r)
[10:33:23 INF] Nodes = 126, Relationships = 357
[10:33:23 INF] startNodeIndex = 0, endNodeIndex = 126
[10:33:23 INF] startRelationshipIndex = 0, endRelationshipIndex = 357
[10:33:23 INF] https://neo-to-cosmos-ssm2.documents.azure.com:443/
[10:33:25 INF] {"IndexingPolicy": {"Automatic": true, "IndexingMode": "Consistent", "IncludedPaths": [{"Path": "/*", "Indexes": [], "$type": "IncludedPath"}], "ExcludedPaths": [{"Path": "/\"_etag\"/?", "$type": "ExcludedPath"}], "CompositeIndexes": [], "SpatialIndexes": [], "$type": "IndexingPolicy"}, "DocumentsLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/docs/", "StoredProceduresLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/sprocs/", "TriggersLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/triggers/", "UserDefinedFunctionsLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/udfs/", "ConflictsLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/conflicts/", "PartitionKey": {"Paths": ["/name"], "Version": null, "$type": "PartitionKeyDefinition"}, "DefaultTimeToLive": null, "TimeToLivePropertyPath": null, "UniqueKeyPolicy": {"UniqueKeys": [], "$type": "UniqueKeyPolicy"}, "ConflictResolutionPolicy": {"Mode": "LastWriterWins", "ConflictResolutionPath": "/_ts", "ConflictResolutionProcedure": "", "$type": "ConflictResolutionPolicy"}, "PartitionKeyRangeStatistics": [], "Id": "neo_migration_test", "ResourceId": "TqYHANXqsR8=", "SelfLink": "dbs/TqYHAA==/colls/TqYHANXqsR8=/", "AltLink": "dbs/graphdb/colls/neo_migration_test", "Timestamp": "2021-01-12T16:10:00.0000000Z", "ETag": "\"00003304-0000-0200-0000-5ffdc9d80000\"", "$type": "DocumentCollection"}

I did added MATCH (n) WHERE EXISTS(n.orderid) SET n += {name: "order"} RETURN n to have common partition key as name in Noe4J before that (this worked before).

syedhassaanahmed commented 3 years ago

@sesmyrnov I think I know why that might be happening :) Most likely you had a previous successful run of a migration. Neo-to-cosmos creates a local RocksDB cache to store the vertex and edge indices that were successfully migrated. This exists to provide resume support in case of a large Neo4j DB migration. To clear the cache, you should either pass the -r flag to neo-to-cosmos or remove the cache directory (default location is where neo-to-cosmos is running from).

sesmyrnov commented 3 years ago

thanks for the gotcha. Yes -r flag worked like a charm..