x-atlas-consortia / ubkg-neo4j

A container implementation to serve the Unified Biomedical Knowledge Graph in Neo4j
MIT License
1 stars 0 forks source link

neo4j Docker: Wait for neo4j server to complete building indexes before switching to read-only mode #61

Closed AlanSimmons closed 3 months ago

AlanSimmons commented 3 months ago

Statement of Problem

The start.sh script called by the Dockerfile for the neo4j server appears to set the mode for the server to read-only before the server has completed building indexes.

This seems to be an issue for Linux, but not Mac.

Likely solution

Add a wait loop that confirms that all indexes are complete before progressing.

yuanzhou commented 3 months ago

@AlanSimmons A more fine-grained and reliable approach I can suggest is to monitor the startup logging. Let the start.sh output all the logging details from log/debug.log and when you see lines like below:

2024-08-01 16:30:57.823+0000 INFO  [o.n.b.p.c.c.n.SocketNettyConnector] Bolt enabled on 0.0.0.0:7687.
2024-08-01 16:30:57.823+0000 INFO  [o.n.b.BoltServer] Bolt server started
2024-08-01 16:30:57.823+0000 INFO  [o.n.s.A.ServerComponentsLifecycleAdapter] Starting web server
2024-08-01 16:30:59.164+0000 INFO  [o.n.s.CommunityNeoWebServer] Remote interface available at http://localhost:7474/
2024-08-01 16:30:59.164+0000 INFO  [o.n.s.A.ServerComponentsLifecycleAdapter] Web server started.
2024-08-01 16:30:59.169+0000 INFO  [o.n.g.f.DatabaseManagementServiceFactory] id: 8B844E5E83F71F8E06C685C8B86755E3D9E2E14EF1B170FA76C4246768E05127

Then do a grep and see if there are any java errors with Caused by:. If NO errors, change to read-only mode.

Because even though the logging says the server is started, it's NOT guaranteed there's no java errors that prevent the normal lifecycle.

    at org.neo4j.kernel.impl.api.index.IndexingService.dontRebuildIndexesInReadOnlyMode(IndexingService.java:361) ~[neo4j-kernel-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.impl.api.index.IndexingService.lambda$start$3(IndexingService.java:303) ~[neo4j-kernel-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.impl.api.index.IndexMapReference.modify(IndexMapReference.java:48) ~[neo4j-kernel-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.impl.api.index.IndexingService.start(IndexingService.java:281) ~[neo4j-kernel-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:348) ~[neo4j-common-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:92) ~[neo4j-common-5.11.0.jar:5.11.0]
    at org.neo4j.kernel.database.AbstractDatabase.start(AbstractDatabase.java:162) ~[neo4j-kernel-5.11.0.jar:5.11.0]
    ... 11 more
    Suppressed: org.neo4j.kernel.lifecycle.LifecycleException: Exception during graceful attempt to stop partially started component. Please use non suppressed exception to see original component failure.
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:355) ~[neo4j-common-5.11.0.jar:5.11.0]
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:92) ~[neo4j-common-5.11.0.jar:5.11.0]
        at org.neo4j.kernel.database.AbstractDatabase.start(AbstractDatabase.java:162) ~[neo4j-kernel-5.11.0.jar:5.11.0]
        at org.neo4j.dbms.database.DatabaseLifecycles.startDatabase(DatabaseLifecycles.java:123) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.dbms.database.DatabaseLifecycles.initialiseDefaultDatabase(DatabaseLifecycles.java:89) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.dbms.database.DatabaseLifecycles$DefaultDatabaseStarter.start(DatabaseLifecycles.java:185) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:348) [neo4j-common-5.11.0.jar:5.11.0]
        at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:92) [neo4j-common-5.11.0.jar:5.11.0]
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:263) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:208) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.server.CommunityBootstrapper.createNeo(CommunityBootstrapper.java:38) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:187) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.server.NeoBootstrapper.start(NeoBootstrapper.java:99) [neo4j-5.11.0.jar:5.11.0]
        at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:30) [neo4j-5.11.0.jar:5.11.0]
    Caused by: java.lang.NullPointerException: Cannot invoke "org.neo4j.scheduler.JobHandle.cancel()" because "this.usageReportJob" is null
        at org.neo4j.kernel.impl.api.index.IndexingService.stop(IndexingService.java:444) ~[neo4j-kernel-5.11.0.jar:5.11.0]
        at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:353) ~[neo4j-common-5.11.0.jar:5.11.0]
        ... 13 more
2024-08-01 16:28:54.416+0000 INFO  [o.n.b.p.c.c.n.SocketNettyConnector] Bolt enabled on 0.0.0.0:7687.
2024-08-01 16:28:54.417+0000 INFO  [o.n.b.BoltServer] Bolt server started
2024-08-01 16:28:54.417+0000 INFO  [o.n.s.A.ServerComponentsLifecycleAdapter] Starting web server
2024-08-01 16:28:55.363+0000 INFO  [o.n.s.CommunityNeoWebServer] Remote interface available at http://localhost:7474/
2024-08-01 16:28:55.363+0000 INFO  [o.n.s.A.ServerComponentsLifecycleAdapter] Web server started.
2024-08-01 16:28:55.367+0000 INFO  [o.n.g.f.DatabaseManagementServiceFactory] id: 8B844E5E83F71F8E06C685C8B86755E3D9E2E14EF1B170FA76C4246768E05127