rwynn / gtm

gtm (go tail mongo) is a MongoDB event listener
MIT License
147 stars 33 forks source link

Avoid race condition by sending copies of mongo data as op.Data #8

Closed zph closed 7 years ago

zph commented 7 years ago

Commit avoids race conditions in tools that depend on GTM by copying the result map[string]interface{} into a new data structure.

By doing so, each op's Data is decoupled from each other and will not run into concurrent read/write failures in the downstream library.

This could be left to downstream libraries to implement but seems preferable to handle here.

Thanks for your continued work on GTM!

rwynn commented 7 years ago

Should we do a deep clone here since the data may represent maps of maps? And only if the length of the "mapped" slice is greater than 1?

Maybe use encoding/gob since that would give us a deep clone?

rwynn commented 7 years ago

Actually the mgo/bson library already imported can give us a deep clone of the map.

rwynn commented 7 years ago

@zph, I created a branch race_1 with a fix that does deep cloning to ensure that the op data is completely untangled in memory. This seems to work ok for me. Please give it a shot when you get a chance.

zph commented 7 years ago

@rwynn I took a look at race_1.

I don't think it's necessary to bson.Marshal and unmarshal the data in order to get a copy. By using for k, v := range... we're getting a copy of each key and value in the new map. That copies the top level values aka a shallow copy. Here's a source: http://stackoverflow.com/documentation/go/732/maps/9834/copy-a-map#t=201703151602048191714.

I'd lean towards trying the shallow copy for now and moving to a deep copy solution if it demonstrates a problem. When I tested the shallow copy it no longer caused concurrent map read/write errors.

Your instincts on gob seem good, here's an example of using gob to deep copy data structures map[string]interface{} along with tests: http://stackoverflow.com/questions/23033143/is-there-a-built-in-function-in-go-for-making-copies-of-arbitrary-maps#comment66812593_28579297 and https://gist.github.com/soroushjp/0ec92102641ddfc3ad5515ca76405f4d.

I do like the optimization though of checking len(ops) and only executing the copy if multiple ops need the Data.

Want me to include the length check in this PR? Or if you like, you can checkout this branch and commit to it yourself before merging the pull request.

zph commented 7 years ago

Also, thanks again for your work on this library :)

rwynn commented 7 years ago

@zph, I think for most use cases, the shallow copy would be fine. I was just trying to solve the general case so that if some other consumer of gtm has docs like { foo: {bar: 1} }, then access of the bar map concurrently would not cause a race. If you think this is overkill, just add the len check to your PR and I'll merge it.

zph commented 7 years ago

I think shallow copy will be sufficient since we don't have more complex or recursive data structures.

Trying this out now...

zph commented 7 years ago

👍 it looks good in local testing against a staging environment.

Ready for merge when you get a chance. Thanks for talking through solutions on this :).