rwynn / gtm

gtm (go tail mongo) is a MongoDB event listener
MIT License
146 stars 33 forks source link

Limit object reads to specific databases/collections #26

Closed edanaher closed 3 years ago

edanaher commented 3 years ago

We're using moresql (which uses this library) to send certain collections from a MongoDB server to Postgres. It seems like it would require fewer permissions (and possibly be more efficient) if there was a way to tell gtm to ignore changes to other collecitions.

In particular, it seems that gtm's Flush function is attempting to find all results from any oplog change to pass up to its user, and so when there's a change to system.sessions, we attempt to load that data and fail with a permissions error due to a lack of permissions on our hosted Mongo server.

As a quick fix, I just hardcoded an exclusion for the "system" db, but it would be nice if there were some more principled option for this.

Additional Details

After running for a bit (between a few seconds and a couple minutes), moresql is dying with

FATA[0052] Exiting: Mongo tailer returned error not authorized on config to execute command { find: "system.sessions", filter: { _id: { $in: [ { id: UUID("09f2d949-ddbd-410f-9f3c-ac6cd5ed609b"), uid: BinData(0, A692C2C7CEBFE5DFCBBE61D42E17B4B0617EDE8A36FFCFD7B4FCE10C9D322C4B) }, [snip] ) } ] } }, skip: 0, $db: "config" }

We're using a hosted mongo solution that doesn't seem to allow us to grant read access on the system collections, so just giving the moresql user the relevant permissions isn't an option.

By adding a line to gtm's Flush function that skips any update on the config DB, I was able to suppress this error. It would be nice to have a more robust solution, though.

(This ticket largely copied from zph/moresql#15)

rwynn commented 3 years ago

hi @edanaher

Haven't looked at moresql lately. Not sure if it has upgraded gtm. Previously gtm worked with the mgo driver but recent versions use the official MongoDB go driver.

gtm has a NamespaceFilter option on its configuration allowing one to say yes/no about whether an event should be processed further. Does this work?

Gtm now supports opening change streams instead of tailing the MongoDB oplog. This allows one to open streams against a set of specific collections so you potentially wouldn't even need the filter mentioned above. Change streams, being an official API, also have the benefit of working more seamlessly across MongoDB shards.

See usage for an example of how to configure the latest gtm to read only for a specific set of collections.

edanaher commented 3 years ago

Thanks @rwynn! Honestly, I ran into this months ago, and with my quick hack fix, didn't investigate too deeply.

It looks like moresql is using gtm version v0.0.0-20170315180800-22eec6961032; it looks like NamespaceFilter was added in 2018, so there's one issue. (And I'm amused that my guess about performance was apparently correct, since that commit is "performance opt allow filter before reading data attrs").

That NamespaceFilter does look like it'll do exactly what I want, though, and change streams would also do the trick. So I'll close this and move the discussion back over to moresql to take advantage of those newer features here.