torodb / stampede

The ToroDB solution to provide better analytics on top of MongoDB and make it easier to migrate from MongoDB to SQL
https://www.torodb.com/stampede/
GNU Affero General Public License v3.0
1.77k stars 120 forks source link

Oplog replication stream finished exceptionally: null #216

Closed brandfocus closed 5 years ago

brandfocus commented 6 years ago

We've started to experience the following error after a few hours of use. Any help would be much appreciated...

2018-03-05T09:06:26.952 INFO LIFECYCLE - Starting up ToroDB Stampede 2018-03-05T09:06:27.156 INFO BACKEND - Configured PostgreSQL backend at localhost:5432 2018-03-05T09:06:28.113 INFO BACKEND - Created pool session with size 28 and level TRANSACTION_REPEATABLE_READ 2018-03-05T09:06:28.182 INFO BACKEND - Created pool system with size 1 and level TRANSACTION_REPEATABLE_READ 2018-03-05T09:06:28.200 INFO BACKEND - Created pool cursors with size 1 and level TRANSACTION_REPEATABLE_READ 2018-03-05T09:06:29.453 INFO BACKEND - Schema 'torodb' found. Checking it... 2018-03-05T09:06:29.644 INFO BACKEND - Schema 'torodb' checked 2018-03-05T09:06:29.651 INFO BACKEND - Database metadata has been validated 2018-03-05T09:06:29.867 INFO LIFECYCLE - All replication shards are consistent 2018-03-05T09:07:26.127 INFO LIFECYCLE - Starting replication from replica set named rs0 2018-03-05T09:07:27.429 INFO REPL - Starting replication service 2018-03-05T09:07:29.118 INFO REPL - Waiting for 2 pings from other members before syncing 2018-03-05T09:07:29.143 INFO REPL - Member mongodb-warehouse-server-1:27017 is now in state RS_PRIMARY 2018-03-05T09:07:30.119 INFO REPL - Waiting for 1 pings from other members before syncing 2018-03-05T09:07:31.120 INFO REPL - Waiting for 1 pings from other members before syncing 2018-03-05T09:07:32.128 INFO REPL - syncing from: mongodb-warehouse-server-1:27017 2018-03-05T09:07:32.133 INFO REPL - Topology service started 2018-03-05T09:07:32.250 INFO REPL - Database is consistent. 2018-03-05T09:07:32.252 INFO REPL - Replication service started 2018-03-05T09:07:32.252 INFO LIFECYCLE - ToroDB Stampede is now running 2018-03-05T09:07:32.254 INFO REPL - Starting SECONDARY mode 2018-03-05T09:07:32.486 INFO REPL - Reading from mongodb-warehouse-server-1:27017 2018-03-05T09:07:37.320 WARN REPL - Oplog replication stream finished exceptionally: null 2018-03-05T09:07:37.387 ERROR REPL - Catched an error on the replication layer. Escalating it 2018-03-05T09:07:37.387 ERROR LIFECYCLE - Error reported by replication supervisor. Stopping ToroDB Stampede 2018-03-05T09:07:37.414 INFO LIFECYCLE - Shutting down ToroDB Stampede 2018-03-05T09:07:37.442 INFO REPL - Shutting down replication service 2018-03-05T09:07:37.562 INFO REPL - Topology service shutted down 2018-03-05T09:07:37.583 INFO REPL - Replication service shutted down 2018-03-05T09:07:38.605 INFO LIFECYCLE - ExecutorService java.util.concurrent.ScheduledThreadPoolExecutor@7387c99f[Shutting down, pool size = 1, active threads = 0, queued tasks = 1, completed tasks = 18] did not finished in PT1S 2018-03-05T09:07:39.447 INFO LIFECYCLE - ToroDB Stampede has been shutted down

brandfocus commented 6 years ago

@teoincontatto

brandfocus commented 6 years ago

@gortiz

gortiz commented 6 years ago

Hi @brandfocus. I'm sorry, but I can not help you since I'm out of the project since almost a year ago :/

brandfocus commented 6 years ago

@gortiz Who actively maintains this project now?

ahachete commented 6 years ago

Hi @brandfocus I'm afraid project as of today is not actively maintained. The exception is probably due to some commands that are present in the replication oplog that may not be supported by Stampede. Because of this, the way to fix this is to identify the command (via debugging) and then either implementing the corresponding action in ToroDB or doing that as a no-op. Patches are more than welcome! ;)

brandfocus commented 6 years ago

@ahachete I've dumped the oplog from the timestamp of the last successful read by Toro after running it in full debug code and cannot find any unsupported commands.

The only errors I can find in the Mongo logs are the following at the time of failure:

` [conn247] Assertion: 16089:Cannot kill pinned cursor: 16487354684 src/mongo/db/catalog/cursor_manager.cpp 236

I COMMAND [conn247] killcursors local.oplog.rs exception: Cannot kill pinned cursor: 16487354684 code:16089 numYields:0 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, oplog: { acquireCount: { r: 1 } } } 96ms `

teoincontatto commented 5 years ago

Hi @brandfocus,

We released Stampede 1.0.0 with some fixes and improvements. You can give it a try, also please post here the log of Stampede after enabling DEBUG info: stampede --log-level DEBUG

asaf050 commented 4 years ago

Hi @brandfocus Have you managed to solve it out? Thanks