hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

Hetionet Browser is down #49

Open Travis-Barton opened 1 year ago

Travis-Barton commented 1 year ago

When trying to connect to: https://neo4j.het.io/browser/

I get the following message: Screen Shot 2022-10-21 at 1 37 45 PM

Which is new, I've been querying it for several days now without problems. Any idea why it suddenly needs a username/password?

dhimmel commented 1 year ago

Ah yes I see:

Database access not available. Please use :server connect to establish connection. There's a graph waiting for you.

It's probably the case that some part of the Neo4j instance is down but the part that returns the website is alive. Hence, you get the website without the database.

Let me tag @falquaddoomi. Perhaps we can add a health check that does a simple cypher query, to ensure the database is actually alive.

falquaddoomi commented 1 year ago

So, I've restarted all the VMs and the managed database associated with the project, which seems to have fixed things for now. That was likely more than what was necessary to fix this issue, so I'll do some poking around to figure out what actually needs to be restarted when this database issue comes up. @dhimmel, that's a good idea about issuing a real query as the health check -- right now it just checks that the neo4j browser UI is accessible, which apparently isn't enough to determine that the service is working. I'll report in this issue once the health check is modified and I've verified that it stops this specific issue.

Longer-term, I'll carve out some time to investigate migrating to neo4j 4.x (right now we're one 3.5.12). I'll also continue to investigate right-sizing the hardware so the container doesn't have to be restarted at all, let alone every few days as it is now.

Travis-Barton commented 1 year ago

The browser is down again.

dhimmel commented 1 year ago

The browser is down again.

Argh! Thanks for the notification.

I'll carve out some time to investigate migrating to neo4j 4.x

I'll try to post an issue soon with the main things to be aware of for the upgrade to organize my thoughts from https://github.com/hetio/hetionet/pull/33

I'll also continue to investigate right-sizing the hardware so the container doesn't have to be restarted at all

I suspect there's some sort of memory leak such that even the largest container might eventually break. It's also possible some users are submitting queries that end up exploding the instance.