neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
621 stars 160 forks source link

Query timeout Neo4j DataScience Sandbox #167

Closed golubovic closed 2 years ago

golubovic commented 2 years ago

After running 'usecase=recommendations' query (below) repeatedly fails immediately (or after two hours when the query I assume times out). Query before this one executes and returns a single row of data, so connectivity is working fine. Please let me know how can problem be rectified.

Thanks! Mladen

### Query user_query = """ MATCH (u:User) RETURN u.userId AS userId """ user_x, user_mapping = load_node(user_query, index_col='userId')

### Neo4j Code https://github.com/tomasonjo/blogs/blob/master/pyg2neo/Movie_recommendations.ipynb

### Connection Details: Username: neo4j Copy to clipboard Password: receivers-consequence-compass Copy to clipboard IP Address: 3.236.76.247 Copy to clipboard HTTP Port: 7474 Copy to clipboard Bolt Port: 7687 Copy to clipboard Bolt URL: bolt://3.236.76.247:7687 Copy to clipboard Websocket Bolt URL: bolt+s://be523115d3ab27688998d1171e497c81.neo4jsandbox.com:7687 Copy to clipboard

tomasonjo commented 2 years ago

When I was developing the code, I had not run into the timeout issue. However, you can increase the query timeout by setting the following configuration in Neo4j.

call dbms.setConfigValue('dbms.transaction.timeout','0')

If the query timeouts after two hours, there is a problem somewhere else, as Neo4j should kill the query after a minute or so if I remember correctly.

golubovic commented 2 years ago

Hi @tomasonjo

Great articles btw, I would suggest that if you can always print out 1 row from every query you run (even if it is embeddings) just to make pre-processing easily palatable to the reader.

Back to the issue, I have tried yesterday and today to recreate Neo4j environments 3 times from scratch by terminating them. I have tried 2 ways of connecting (code given below). What I did not mention yesterday is that other queries in a given notebook regularly fail, I was restarting them a few times and they eventually do work. The query that I have mentioned above was never executed successfully. Error one usually gets when running every query is given below for your reference.

I liked the idea of doing a quick POC with Noe4j, pitty that was not possible I may try to deploy containers when/if I have time in the future If you have ready-made containers already set up with given datasets please send me the link. I find your articles very useful.

Thanks, Mladen

### Connection Error (which can be fixed by re-runing the notebook cell): ServiceUnavailable: Couldn't connect to 18.204.222.208:7687 (resolved to ('18.204.222.208:7687',)): Failed to establish connection to ResolvedIPv4Address(('18.204.222.208', 7687)) (reason [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)

### Connectivity methods I have tired: from neo4j import GraphDatabase, basic_auth

driver = GraphDatabase.driver(

"neo4j://18.204.222.208:7687",

auth=basic_auth("neo4j", "purchaser-song-verbs"))

from neo4j import GraphDatabase url= 'bolt://18.204.222.208:7687' user = 'neo4j' password = 'purchaser-song-verbs' driver = GraphDatabase.driver(url, auth=(user, password))

def fetch_data(query, params={}): with driver.session() as session:

with driver.session(database="neo4j") as session:

result = session.run(query, params)
return pd.DataFrame([r.values() for r in result], columns=result.keys())
tomasonjo commented 2 years ago

It seems there is something wrong with the Neo4j connection. That query should be a big deal as there are only 700 users in the database. I have tried it in Google Colab and with my Sandbox and it seems to work. The credentials are:

url = 'bolt://3.239.161.201:7687' user = 'neo4j' password = 'basics-rinses-precedence'

You can test it out and see if it works, if not, there are some issues in your local environment. Have you tried to update the Python Neo4j driver?

I don't have a docker image ready with the data, but there is a database dump available on github: https://github.com/neo4j-graph-examples/recommendations/tree/main/data

You can download the dump and follow the instructions in my previous blog how to get it up and running with Neo4j Desktop on your local machine: https://towardsdatascience.com/exploring-the-nft-transaction-with-neo4j-cba80ead7e0b

golubovic commented 2 years ago

Hi @tomasonjo ,

THis are drivers installed (as per !pip3 install neo4j-driver from the notebook) Requirement already satisfied: neo4j-driver in [c:\users\m200240\anaconda3\lib\site-packages]() (4.4.1) Requirement already satisfied: pytz in [c:\users\m200240\anaconda3\lib\site-packages]() (from neo4j-driver) (2021.3)

Even with your credentials (thanks for sending them) the first query in the notebook fails. Today I have not managed to fire a single query successfully against the database, yesterday I did.

ServiceUnavailable: Couldn't connect to 3.239.161.201:7687 (resolved to ('3.239.161.201:7687',)): Failed to establish connection to ResolvedIPv4Address(('3.239.161.201', 7687)) (reason [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond)