torodb / stampede

The ToroDB solution to provide better analytics on top of MongoDB and make it easier to migrate from MongoDB to SQL
https://www.torodb.com/stampede/
GNU Affero General Public License v3.0
1.76k stars 118 forks source link

Thinking of using stampede for ETL #230

Closed rmainwork closed 5 years ago

rmainwork commented 5 years ago

Sorry if this isn't the right place for this, but I'm thinking of using Stampede for an ETL process. Specifically, I want to download some information from our production MongoDB instance, munge it on the way through to remove personally identifiable information like people's name etc. and then dump it into our dev postgres instance.

Just wondered if that was supported in Stampede?

teoincontatto commented 5 years ago

Hi @rmainseas, currently Stampede does not support transformation of the data, just inclusion / exclusion for databases, collection and indexes. For this reason Stampede could not achieve your goal alone.

Probably it could be combined with some triggers you could add by hand to the tables Stampede creates in order to obfuscate / remove the sensible data. That is, after importing a complete dump (with sensible data) and after live replication has started, stop Stampede and cleanup the sensible data from PostgreSQL also adding the trigger that would cleanup the sensible data on insert and update for those tables where sensible data would be stored. Then you can start again Stampede that should continue where it left (make sure the replication buffer is big enough in MongoDB).

rmainwork commented 5 years ago

Thanks, that makes sense.