sdv-dev / RDT

A library of Reversible Data Transforms
Other
116 stars 24 forks source link

Refactoring code for Enterprise issue #529 #815

Closed amontanez24 closed 4 months ago

amontanez24 commented 4 months ago

This PR alters the logic of _reverse_transform so that it can work naturally with the random regex transformer.

The previous logic was:

  1. If the amount of values the generator has left is more than the number of samples asked for, sample them all.
  2. If the amount of values the generator has left is less than the number of sample asked for, sample the rest of the generator.
    • If enforce_uniqueness is enabled, then add suffixes to old values to make more.
    • If enforce_uniqueness is disabled, copy previous values

This doesn't work for the random regex generator for two reasons:

  1. It can't always sample all the values because it has collisions
  2. You can't easily get all "remaining" values because technically it's unlimited.

The new logic that should work for both cases is

  1. Sample as many values from the generator as you can until you either get enough or hit an exception.
    • The exception means the generator either ran out (in the not random case) or we had too many collisions (in the random case)
  2. If more values are required, add them by:
    • If enforce_uniqueness is enabled, then add suffixes to old values to make more.
    • If enforce_uniqueness is disabled, copy previous values

CU-86b04th7c

sdv-team commented 4 months ago

Task linked: CU-86b04th7c SDV-Enterprise - In RegexGenerator, provide an option to generate keys in a random manner #529

amontanez24 commented 4 months ago

I'm actually going to have to change this because I found another bug in the enterprise version that would require more refactoring here