rwynn / monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime. you know, for search.
https://rwynn.github.io/monstache-site/
MIT License
1.28k stars 180 forks source link

dropDatabase() disrupts monstache #584

Open tamis-laan opened 2 years ago

tamis-laan commented 2 years ago

In my application I'm using mongoose.connection.dropDatabase() to clear the database and then create dummy data to test my application.

Dropping the database causes monstache to stop watching my collection through change streams even though the database and collections are subsequently recreated after.

rwynn commented 2 years ago

Is it ok if you watch for changes at the deployment level?

change-stream-namespaces = [ "" ]
tamis-laan commented 2 years ago

@rwynn

This would index all my collections right? In that case not really. If I use remove() on each collection monstache works properly but this is not ideal.

rwynn commented 2 years ago

I will look into it. It should be retrying forever to re-establish the change stream. So, you have something like?

change-stream-namespaces = [ "somedb" ]

According to https://docs.mongodb.com/manual/reference/change-events/#invalidate-event an invalidate event should be fired when somedb is dropped and then that should be getting handled at https://github.com/rwynn/gtm/blob/b948dffca32346792140325485cb307625c1e35e/gtm.go#L1359.

tamis-laan commented 2 years ago

Using docker compose and environment variables:

MONSTACHE_CHANGE_STREAM_NS: testindex

BTW it looks as if not all configuration options are reflected as environment variables, is that correct?

rwynn commented 2 years ago

Just the major ones. There are so many.

rwynn commented 2 years ago

I did a quick manual test on this and I wasn't able to see the issue.

$ go run monstache.go -change-stream-namespace test -verbose
INFO 2021/12/03 22:55:33 Started monstache version 6.7.6
INFO 2021/12/03 22:55:33 Go version go1.15.8
INFO 2021/12/03 22:55:33 MongoDB go driver v1.7.2
INFO 2021/12/03 22:55:33 Elasticsearch go driver 7.0.28
INFO 2021/12/03 22:55:33 Successfully connected to MongoDB version 4.2.17
INFO 2021/12/03 22:55:33 Successfully connected to Elasticsearch version 7.15.2
INFO 2021/12/03 22:55:33 Listening for events
INFO 2021/12/03 22:55:33 Sending systemd READY=1
WARN 2021/12/03 22:55:33 Systemd notification not supported (i.e. NOTIFY_SOCKET is unset)
INFO 2021/12/03 22:55:33 Watching changes on database test

It inserted a document, dropped the db, and then inserted another.

use test;
db.test.insert({foo:1});
db.dropDatabase();
db.test.insert({foo: 2});

If you are doing this in a script you might try sleeping for 5 seconds after you drop the database before doing more inserts. The listener waits for a few seconds before trying to re-establish the change stream in the case of a drop. https://github.com/rwynn/gtm/blob/master/gtm.go#L1363

tamis-laan commented 2 years ago

Indeed sleeping 5 seconds works but it does significantly slow down the development cycle.

The part I don't understand is that monstache doesn't check the change stream when reconnecting and synchronizes with the old inserts.

How would this work in production when monstache for some reason fails and the container is rebooted? Any inserts or change in between the reboot would be lost?

rwynn commented 2 years ago

You can try the latest commit in both rel5 and rel6 without the sleep. This requires your MongoDB supports startAfter.

tamis-laan commented 2 years ago

Works perfect!!!

UPDATE: Oke maybe not, I'm getting the following:

backend-server-monstache-1  | INFO 2022/01/22 13:15:12 Started monstache version 6.7.7
backend-server-monstache-1  | INFO 2022/01/22 13:15:12 Go version go1.17.4
backend-server-monstache-1  | INFO 2022/01/22 13:15:12 MongoDB go driver v1.8.0
backend-server-monstache-1  | INFO 2022/01/22 13:15:12 Elasticsearch go driver 7.0.28
backend-server-monstache-1  | INFO 2022/01/22 13:15:12 Successfully connected to MongoDB version 5.0.5
backend-server-monstache-1  | INFO 2022/01/22 13:15:18 Successfully connected to Elasticsearch version 7.14.1
backend-server-monstache-1  | INFO 2022/01/22 13:15:18 Listening for events
backend-server-monstache-1  | INFO 2022/01/22 13:15:18 Watching changes on collection mydb.mycollection
backend-server-monstache-1  | INFO 2022/01/22 13:15:18 Direct reads completed
backend-server-monstache-1  | ERROR 2022/01/22 13:15:36 elastic: Error 404 (Not Found): no such index [mydb.mycollection] [type=index_not_found_exception]: [details={"type":"index_not_found_exception","reason":"no such index [mydb.mycollection]","resource.type":"index_or_alias","resource.id":"mydb.mycollection","index":"mydb.mycollection","root_cause":[{"type":"index_not_found_exception","reason":"no such index [mydb.mycollection]","resource.type":"index_or_alias","resource.id":"mydb.mycollection","index":"mydb.mycollection"}]}]
rwynn commented 2 years ago
dropped-collections = false
dropped-databases = false

would tell monstache not to drop your indexes when you drop your Mongo db or collections.

tamis-laan commented 2 years ago

I would like to also drop the index when I drop my Mongo DB collection.

The problem is that after I drop my collection in Mongo I immediately start generating 100 new documents. These documents don't show up in elastic search.

If I generate 1000 documents, documents do show up in elastic search.

It looks as if the drop index and add new documents are not processed in order or are processed in parallel by elastic search.

Does Monstache await the drop index operation before sending the rest of the change stream?