pvcy / presidio

MIT License
0 stars 0 forks source link

Write tests for recognizers migrated from Privacy API scanner library #7

Closed willsthompson closed 3 years ago

willsthompson commented 4 years ago

Write tests for the new recognizers created in issue #6

Open question: Should the new recognizers live in the presidio repo or in the privacy-api repo? I'm leaning toward privacy-api to prevent the presidio repo from unnecessarily diverging further from the public master. It would also be easier to contribute our updates back to the public repo without including every (or any) custom recognizers.

If recognizers are moved, also clean up presidio's history.

willsthompson commented 3 years ago
willsthompson commented 3 years ago

I finished a very cursory pass at these tests. Most recognizers need more tests, especially expected non-matching cases and variations with titles. Multiple recognizers need improvement for more robust detection.

In the next phase of testing, these should probably be pushed down a level and tested at the output of engine.analyze(), instead of pii_report. The pii_report testing should be separated/isolated to define its independent behavior, which may not be much, assuming there are tests covering filter_intersecting_results, is_categorical, and ordered type detection/handling. The pii_report tests may be better suited as a schema/packaging validation.