rap2hpoutre / pg-anonymizer

Dump anonymized PostgreSQL database with a NodeJS CLI
https://raph.site
MIT License
223 stars 31 forks source link

Context aware anonymization #39

Closed salacr closed 10 months ago

salacr commented 1 year ago

It would be great to add the possibility to pass a whole row to a masking function we had a situation when we have a column called entity_number and for a person, it's their personal_id for the company itis their company_id, based on another column (entity_type) we can distinguish between them. so it will be really useful to have this "context" available so I can generate random personal_id / company_id. I understand that it's not possible if the information is stored in another table but if it's in the same row it shouldn't be so difficult I hope. I can try to implement it myself if yo are accepting merge requests but to be honest I'm not very familiar with javascript/typescript :/ but I can give it a shot

jackall3n commented 1 year ago

I was thinking about submitting a PR for this work, as I think it'll be really useful. However, it looks like there's stale open PRs not being merged, so not sure if it'll be merged. I might create it and post a new (temporary) package to my own npmjs account

salacr commented 1 year ago

Well we will see I hope that there will be some reactions :)

rap2hpoutre commented 1 year ago

👀

jackall3n commented 1 year ago

I've submitted a PR for this. I've made it so that the extension (now called transformer) supports a "tables" property.

transformer.js

const faker = require("faker");

module.exports = {
  tables: {
    "public.user": value => {
      console.log({ value });

      return {
        name: faker.name.firstName(),
        password: faker.random.alphaNumeric(10),
      };
    },
  },
};
npx pg-anonymizer [database-url] --transformer ./transformer.js 
salacr commented 1 year ago

Do I understand correctly that the value contains "whole row" ? and only columns returned will be anonymized?

In that case, it should be exactly what I need!

jackall3n commented 1 year ago

@salacr yes exactly, value is of type Record<string, string>, e.g. { id: "1234", name: "Salacr" }. The values for each of the columns you return will be used, and the rest are ignored (unless they're also specified in --columns, formally --list)

rap2hpoutre commented 10 months ago

Fixed via https://github.com/rap2hpoutre/pg-anonymizer/pull/42 thanks to @jackall3n