Closed StijnKas closed 2 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 64.57%. Comparing base (
bbf667a
) to head (70afe32
). Report is 44 commits behind head on master.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@yusufuyanik1 would you mind giving this a review? Would like to merge it sometime soon
We've had an anonymization script in the tools for a little bit, but these were not performant enough on any realistic and real loads, so it was time for an update. The configuration options here are much less, but it's much more efficient.
It utilizes a two-pass approach, whereby we first output all files to batched parquet files and then loop over all parquet files to generate one single output parquet file.
Many thanks to @danielm-dk for helping improve this part.