streetwriters / notesnook-sync-server

Sync server for Notesnook (self-hosting in alpha)
https://notesnook.com/
GNU Affero General Public License v3.0
419 stars 17 forks source link

Identity server container errors out on restarting the container #14

Closed IKA3RUS closed 3 days ago

IKA3RUS commented 4 days ago

My self hosted setup, following the draft guide exactly, works great until I restart the docker compose. On restarting and launching the second time, the identity server container errors out and exits with this error.

$ docker logs notesnook-identity -f
Unhandled exception. System.TimeoutException: A timeout occurred after 30000ms selecting a server using CompositeServerSelector{ Selectors = ReadPreferenceServerSelector{ ReadPreference = { Mode : Primary } }, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 }, OperationsCountServerSelector }. Client view of cluster state is { ClusterId : "1", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: "{ ClusterId : 1, EndPoint : "Unspecified/notesnook-db:27017" }", EndPoint: "Unspecified/notesnook-db:27017", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 7.0.0, TopologyVersion: { "processId" : ObjectId("67169e294efe6388a170063a"), "counter" : NumberLong(1) }, Type: "ReplicaSetGhost", WireVersionRange: "[0, 21]", LastHeartbeatTimestamp: "2024-10-21T18:32:29.9959601Z", LastUpdateTimestamp: "2024-10-21T18:32:29.9959609Z" }] }.
   at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
   at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
   at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChanged(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Clusters.Cluster.SelectServer(IServerSelector selector, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Clusters.IClusterExtensions.SelectServerAndPinIfNeeded(ICluster cluster, ICoreSessionHandle session, IServerSelector selector, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Bindings.ReadPreferenceBinding.GetReadChannelSource(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Bindings.ReadBindingHandle.GetReadChannelSource(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableReadContext.Initialize(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableReadContext.Create(IReadBinding binding, Boolean retryRequested, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.FindOperation`1.Execute(IReadBinding binding, CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.ExecuteReadOperation[TResult](IReadBinding binding, IReadOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperation[TResult](IClientSessionHandle session, IReadOperation`1 operation, ReadPreference readPreference, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperation[TResult](IClientSessionHandle session, IReadOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.FindSync[TProjection](IClientSessionHandle session, FilterDefinition`1 filter, FindOptions`2 options, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.<>c__DisplayClass48_0`1.<FindSync>b__0(IClientSessionHandle session)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSession[TResult](Func`2 func, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.FindSync[TProjection](FilterDefinition`1 filter, FindOptions`2 options, CancellationToken cancellationToken)
   at MongoDB.Driver.FindFluent`2.ToCursor(CancellationToken cancellationToken)
   at MongoDB.Driver.IAsyncCursorSourceExtensions.ToList[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
   at AspNetCore.Identity.Mongo.Migrations.Migrator.Apply[TUser,TRole,TKey](IMongoCollection`1 migrationCollection, IMongoCollection`1 usersCollection, IMongoCollection`1 rolesCollection)
   at AspNetCore.Identity.Mongo.MongoIdentityExtensions.AddIdentityMongoDbProvider[TUser,TRole,TKey](IServiceCollection services, Action`1 setupIdentityAction, Action`1 setupDatabaseAction, IdentityErrorDescriber identityErrorDescriber)
   at AspNetCore.Identity.Mongo.MongoIdentityExtensions.AddIdentityMongoDbProvider[TUser](IServiceCollection services, Action`1 setupIdentityAction, Action`1 setupDatabaseAction)
   at Streetwriters.Identity.Startup.ConfigureServices(IServiceCollection services) in /src/Streetwriters.Identity/Startup.cs:line 78
   at System.RuntimeMethodHandle.InvokeMethod(Object, Void**, Signature, Boolean)
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args)
   at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object, Span`1, BindingFlags)
   at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object, BindingFlags, Binder, Object[], CultureInfo)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo)
   at Microsoft.AspNetCore.Hosting.MethodInfoExtensions.InvokeWithoutWrappingExceptions(MethodInfo , Object, Object[] )
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.InvokeCore(Object, IServiceCollection)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass9_0.<Invoke>g__Startup|0(IServiceCollection serviceCollection)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.Invoke(Object, IServiceCollection)
   at Microsoft.AspNetCore.Hosting.ConfigureServicesBuilder.<>c__DisplayClass8_0.<Build>b__0(IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.UseStartup(Type , HostBuilderContext, IServiceCollection, Object )
   at Microsoft.AspNetCore.Hosting.GenericWebHostBuilder.<>c__DisplayClass7_0.<UseStartup>b__0(HostBuilderContext context, IServiceCollection services)
   at Microsoft.Extensions.Hosting.HostBuilder.InitializeServiceProvider()
   at Microsoft.Extensions.Hosting.HostBuilder.Build()
   at Streetwriters.Identity.Program.Main(String[] args) in /src/Streetwriters.Identity/Program.cs:line 41
   at Streetwriters.Identity.Program.<Main>(String[] args)

The other docker containers keep on running.

$ docker ps -a | grep notesnook
2c82bd9b8aea   streetwriters/monograph:latest             "docker-entrypoint.s…"   8 minutes ago   Up 8 minutes (healthy)       3000/tcp                                                                   notesnook-monograph
a91dad82d38d   streetwriters/sse:latest                   "./Streetwriters.Mes…"   8 minutes ago   Up 8 minutes (healthy)       7264/tcp                                                                   notesnook-sse
6b4639f34db9   streetwriters/notesnook-sync:latest        "./Notesnook.API"        8 minutes ago   Up 8 minutes (healthy)       5264/tcp                                                                   notesnook-sync
be8627017dbd   minio/mc:RELEASE.2024-07-26T13-08-44Z      "/bin/bash -c 'until…"   8 minutes ago   Exited (0) 8 minutes ago                                                                                notesnook-setup-s3
652230eff07f   mongo:7.0.12                               "/bin/sh -c 'mongosh…"   8 minutes ago   Exited (0) 8 minutes ago                                                                                notesnook-initiate-rs0
cf295b1e65b8   streetwriters/identity:latest              "./Streetwriters.Ide…"   8 minutes ago   Exited (139) 7 minutes ago                                                                              notesnook-identity
b21413f2f44a   minio/minio:RELEASE.2024-07-29T22-14-52Z   "/usr/bin/docker-ent…"   8 minutes ago   Up 8 minutes (healthy)       9000/tcp                                                                   notesnook-s3
b7068859d373   mongo:7.0.12                               "docker-entrypoint.s…"   8 minutes ago   Up 8 minutes (healthy)       27017/tcp                                                                  notesnook-db
795f56d412f5   willfarrell/autoheal:latest                "/docker-entrypoint …"   8 minutes ago   Up 8 minutes (healthy)                                                                                  notesnook-autoheal
4232ad56d8a0   vandot/alpine-bash                         "/bin/bash -c '# Lis…"   8 minutes ago   Exited (0) 8 minutes ago                                                                                notesnook-validate

The only modification I've done to the docker compose is giving all the containers a name starting with notesnook-, and adding another docker network to the exposed containers for my traefik setup, which shouldn't really affect anything.

I am able to reproduce this consistently. The behaviour remains the same irrespective of how many times I restart the docker compose, until I remove the docker volumes and start over again - which again stops working after a restart.

How do I go on about troubleshooting this?

dyw770 commented 3 days ago
652230eff07f   mongo:7.0.12                               "/bin/sh -c 'mongosh…"   8 minutes ago   Exited (0) 8 minutes ago                                                                                notesnook-initiate-rs0

Please check the logs of this container

IKA3RUS commented 3 days ago

Here.

$ docker logs notesnook-initiate-rs0
Current Mongosh Log ID: 671860d53247c5a0935e739b
Connecting to:          mongodb://notesnook-db:27017/?directConnection=true&appName=mongosh+2.3.0
Using MongoDB:          7.0.12
Using Mongosh:          2.3.0

For mongosh info see: https://www.mongodb.com/docs/mongodb-shell/

To help improve our products, anonymous usage data is collected and sent to MongoDB periodically (https://www.mongodb.com/legal/privacy-policy).
You can opt-out by running the disableTelemetry() command.

------
   The server generated these startup warnings when booting
   2024-10-23T02:35:01.226+00:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem
   2024-10-23T02:35:01.461+00:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted
   2024-10-23T02:35:01.462+00:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never' in this binary version
   2024-10-23T02:35:01.462+00:00: vm.max_map_count is too low
------

MongoServerError[AlreadyInitialized]: already initialized
MongoServerError[InvalidReplicaSetConfig]: Our replica set config is invalid or we are not a member of it
IKA3RUS commented 3 days ago

These lines from the db container seem relevant too - especially the later ones with

$ docker logs notesnook-db | grep repl 
{"t":{"$date":"2024-10-23T02:51:52.372+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"net":{"bindIp":"*"},"replication":{"replSet":"rs0"}}}}
{"t":{"$date":"2024-10-23T02:51:52.590+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":{"ts_sec":1729651912,"ts_usec":590048,"thread":"1:0x7fbce5a8ac80","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"recovery log replay has successfully finished and ran for 143 milliseconds"}}}
{"t":{"$date":"2024-10-23T02:51:52.604+00:00"},"s":"I",  "c":"WTRECOV",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":{"ts_sec":1729651912,"ts_usec":604078,"thread":"1:0x7fbce5a8ac80","session_name":"txn-recover","category":"WT_VERB_RECOVERY_PROGRESS","category_id":30,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"recovery was completed successfully and took 157ms, including 143ms for the log replay, 0ms for the rollback to stable, and 12ms for the checkpoint."}}}
{"t":{"$date":"2024-10-23T02:51:52.619+00:00"},"s":"I",  "c":"REPL",     "id":6005300, "ctx":"initandlisten","msg":"Starting up replica set aware services"}
{"t":{"$date":"2024-10-23T02:51:52.619+00:00"},"s":"I",  "c":"REPL",     "id":4280500, "ctx":"initandlisten","msg":"Attempting to create internal replication collections"}
{"t":{"$date":"2024-10-23T02:51:52.622+00:00"},"s":"I",  "c":"REPL",     "id":4280507, "ctx":"initandlisten","msg":"Loaded replica set config, scheduled callback to set local config"}
{"t":{"$date":"2024-10-23T02:51:52.622+00:00"},"s":"I",  "c":"REPL",     "id":4280508, "ctx":"ReplCoord-0","msg":"Attempting to set local replica set config; validating config for startup"}
{"t":{"$date":"2024-10-23T02:51:52.623+00:00"},"s":"I",  "c":"CONTROL",  "id":8423403, "ctx":"initandlisten","msg":"mongod startup complete","attr":{"Summary of time elapsed":{"Startup from clean shutdown?":true,"Statistics":{"Transport layer setup":"0 ms","Run initial syncer crash recovery":"0 ms","Create storage engine lock file in the data directory":"0 ms","Get metadata describing storage engine":"0 ms","Validate options in metadata against current startup options":"0 ms","Create storage engine":"234 ms","Write current PID to file":"0 ms","Initialize FCV before rebuilding indexes":"1 ms","Drop abandoned idents and get back indexes that need to be rebuilt or builds that need to be restarted":"0 ms","Rebuild indexes for collections":"0 ms","Build user and roles graph":"0 ms","Set up the background thread pool responsible for waiting for opTimes to be majority committed":"1 ms","Initialize information needed to make a mongod instance shard aware":"0 ms","Start up cluster time keys manager with a local/direct keys client":"0 ms","Start up the replication coordinator":"3 ms","Create an oplog view for tenant migrations":"0 ms","Start transport layer":"1 ms","_initAndListen total elapsed time":"251 ms"}}}}
{"t":{"$date":"2024-10-23T02:51:52.690+00:00"},"s":"W",  "c":"REPL",     "id":21405,   "ctx":"ReplCoord-0","msg":"Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat","attr":{"error":{"code":74,"codeName":"NodeNotFound","errmsg":"No host described in new configuration with {version: 1, term: 1} for replica set rs0 maps to this node"},"localConfig":{"_id":"rs0","version":1,"term":1,"members":[{"_id":0,"host":"def512eafcf5:27017","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1,"tags":{},"secondaryDelaySecs":0,"votes":1}],"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"671863fe7089815366a3a041"}}}}}
{"t":{"$date":"2024-10-23T02:51:52.690+00:00"},"s":"I",  "c":"REPL",     "id":21392,   "ctx":"ReplCoord-0","msg":"New replica set config in use","attr":{"config":{"_id":"rs0","version":1,"term":1,"members":[{"_id":0,"host":"def512eafcf5:27017","arbiterOnly":false,"buildIndexes":true,"hidden":false,"priority":1,"tags":{},"secondaryDelaySecs":0,"votes":1}],"protocolVersion":1,"writeConcernMajorityJournalDefault":true,"settings":{"chainingAllowed":true,"heartbeatIntervalMillis":2000,"heartbeatTimeoutSecs":10,"electionTimeoutMillis":10000,"catchUpTimeoutMillis":-1,"catchUpTakeoverDelayMillis":30000,"getLastErrorModes":{},"getLastErrorDefaults":{"w":1,"wtimeout":0},"replicaSetId":{"$oid":"671863fe7089815366a3a041"}}}}}
{"t":{"$date":"2024-10-23T02:51:52.691+00:00"},"s":"I",  "c":"REPL",     "id":4280511, "ctx":"ReplCoord-0","msg":"Set local replica set config"}
{"t":{"$date":"2024-10-23T02:51:52.903+00:00"},"s":"I",  "c":"REPL",     "id":21356,   "ctx":"conn3","msg":"replSetInitiate admin command received from client"}

And after a couple of minutes, the logs for this container get flooded with these lines repeating on loop.

{"t":{"$date":"2024-10-23T03:00:31.805+00:00"},"s":"I",  "c":"-",        "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received error response","attr":{"host":"def512eafcf5:27017","error":"HostUnreachable: Error connecting to def512eafcf5:27017 :: caused by :: Could not find address for def512eafcf5:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)","replicaSet":"rs0","response":{}}}
{"t":{"$date":"2024-10-23T03:00:31.805+00:00"},"s":"I",  "c":"NETWORK",  "id":4712102, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"Host failed in replica set","attr":{"replicaSet":"rs0","host":"def512eafcf5:27017","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Error connecting to def512eafcf5:27017 :: caused by :: Could not find address for def512eafcf5:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)"},"action":{"dropConnections":true,"requestImmediateCheck":true}}}
{"t":{"$date":"2024-10-23T03:00:31.805+00:00"},"s":"I",  "c":"-",        "id":4333222, "ctx":"ReplicaSetMonitor-TaskExecutor","msg":"RSM received error response","attr":{"host":"def512eafcf5:27017","error":"HostUnreachable: Error connecting to def512eafcf5:27017 :: caused by :: Could not find address for def512eafcf5:27017: SocketException: onInvoke :: caused by :: Host not found (authoritative)","replicaSet":"rs0","response":{}}}
dyw770 commented 3 days ago

@IKA3RUS Please upload your docker-compose.yml. It seems that you haven't set the hostname for db container .

IKA3RUS commented 3 days ago

Damn! That was it. I must have somehow accidentally removed the hostname while adding the container names, my bad. There are no errors anymore.

Thanks for pointing it out!