neo4j / graph-data-science

Source code for the Neo4j Graph Data Science library of graph algorithms.
https://neo4j.com/docs/graph-data-science/current/
Other
596 stars 157 forks source link

Jupyter notebook kernel can go idle during long processing, losing cell results and preventing completion of notebook #291

Open cybersam opened 7 months ago

cybersam commented 7 months ago

Describe the bug When running .predict on a model in a Jupyter notebook cell, the intervals between progress bar output can become so long that the Jupyter kernel decides the cell has finished, and can put the kernel in an idle (basically, "dead") state after enough inactivity. This is very possible for long-running predictions (say, running overnight), where the user steps away and does not touch Jupyter for many hours. When the Neo4j server finally finishes prediction, the results (say, from .predict.stream) of the many hours of processing are lost, since the notebook is dead. I suspect the same problem can occur with other long-running GDS operations.

To Reproduce

I don't know if the eventual slow-down in progress bar output happens for all long-running use cases or server configurations. In my case, it usually happens but sometimes not.

GDS version: 2.5.3 Neo4j version: 5.11.0 Operating system: Amazon Linux

My specific Jupyter environment: JupyterLab 4.0.8, Python 3 (ipykernel) kernel, on AWS EC2 with Amazon Linux

Steps to reproduce the behavior:

Expected behavior The Jupyter notebook should never go idle while any long-running GDS operation is still in progress.

Probably just need to ensure that output is regularly produced (say, every x minutes).

adamnsch commented 7 months ago

Hi @cybersam,

Thank for bringing this to our attention.

The progress of a link prediction pipeline is not linear, so it may well be that there are substantial chunks of time where the algorithm has not reported progress in terms of %, even though it's still running. Is your jupyter environment by any chance running an idle culler? If so, have you tried to configure the idle culler according to your needs? https://tljh.jupyter.org/en/latest/topic/idle-culler.html

Adam

cybersam commented 7 months ago

As far as I know, I am using default Jupyter configuration, in which culling is supposed to be disabled. The config files do not set any culling values.

cybersam commented 7 months ago

Also, it turns out my idle kernel is not culled, even after a long time. It still remembers the state before the cell that "died".

cybersam commented 7 months ago

Some background: the cell in question stores the prediction result in a 'result' variable.

I tried the following experiment, and the results are very interesting:

So, when the kernel goes idle it is apparently still able to get ultimate results. But the cell output is messed up, and subsequent cells do not execute when they are supposed to.