hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
264 stars 69 forks source link

Neo4J instance down (?) #45

Closed DimitrisAlivas closed 2 years ago

DimitrisAlivas commented 2 years ago

Hello HetioNet team,

First of all, thank you for the hard work behind the hetionet graph.

I've seen multiple open issues around accessing Neo4j, but I wasn't certain if I should comment under one of them or open a new one, so I chose to open a new one (apologies if it's a duplicate issue)

I am trying to use the Neo4j explorer to take a look at the HetioNet graph, however I keep getting a connection refused error. Are you aware of it being down? Is #33 related to it?

Thank you in advance for any insights on this.

With kind regards, Dimitrios

dhimmel commented 2 years ago

Just restarted our VM instance (private GCS link) that runs the neo4j server. Instance doesn't quite seem healthy yet, so will investigate. When I ssh to it, I see "System information disabled due to load higher than 2.0".

dhimmel commented 2 years ago

@dongbohu do you have any idea what could be wrong with this instance? Usually restarting fixes the problem, but I'm having trouble SSHing to it: getting "Connection refused".

dongbohu commented 2 years ago

@dhimmel I couldn't ssh into that box either. I restarted the virtual machine from Google Cloud Platform but got the same error. I recommend that you ask Faisal Alquaddoomi (falquaddoomi@gmail.com) for help. He is in charge of all computational resources in Greene lab now. You probably can ask him to give you admin privileges on the two Hetionet virtual instances.

dhimmel commented 2 years ago

Hello @falquaddoomi, nice to meet you! We're having a bit of trouble with one of the Hetionet VMs as noted above. Not sure if you have any insights?

falquaddoomi commented 2 years ago

Hey @dhimmel, nice to meet you, too! Apologies that I didn't catch this; I have an uptime alert set for search-api.het.io (specifically, https://search-api.het.io/v1/random-node-pair/), but I don't have one for the neo4j instance. Do you know if there's a URL I could hit every 5 minutes or so to check if the service is available? EDIT: Would https://neo4j.het.io/browser/ be a good URL to check it?

About the current problem, I can also confirm that the neo4j-het-io VM is having a significant problem: I can only intermittently connect via SSH, and it's very laggy when I can get in. I'm going to continue to debug the issue; I'll let you know if I figure anything out.

falquaddoomi commented 2 years ago

Ok, sorry for the delay; it looks like there was an Ubuntu upgrade that would cause a kernel panic whenever Docker would boot. I managed to get in between when SSH came up and Docker started to disable it, after which I temporarily disabled the docker and containerd services, made a snapshot of the boot disk, then upgraded to the latest non-LTS release (21.10), which seems to have fixed the problem. I've also added an uptime check for the neo4j web UI so that it doesn't take me so long to fix things in the future.

dhimmel commented 2 years ago

@falquaddoomi thanks tremendously! That sounds like a challenging bug to track down.

Would https://neo4j.het.io/browser/ be a good URL to check it?

I think, although sometimes the browser still loads but the underlying database is having issues. Possible example at https://github.com/hetio/hetionet/issues/37. If we have an undetected issue I can look for another URL.

What happens when the uptime check fails? Does it automatically trigger a reboot or just a notification?