jamesaoverton / cmi-pb-terminology

CMI-PB Controlled Terminology
0 stars 0 forks source link

Consider generating data tables from schema information #62

Open jamesaoverton opened 2 years ago

jamesaoverton commented 2 years ago

Somewhat the inverse of #61, it would also be useful to be able to generate tables of "random" data from given table/column/data/rule information.

The brute force approach is to generate random strings until they match the datatype regular expressions.

More elegant would be tools that parse the regex (at least partially) and use that information to generate valid strings. There should be existing tools for this.

Better still would be to define a set of "generator" functions an annotate the datatypes with them. For example: integer, float, date, person name, mailing address, human height, etc.

Such a tool would be valuable for several kinds of testing, but the key use case I have in mind is this randomly generated data (and the special tables) could be public even when the real data is private. This is an important use case when developing open systems for clinical data, where the real patient data must be kept strictly private.