I propose to add a feature to generate the same output several times.
To add deterministic way, I suggest to add a seed and generate all values from this seed. For the same seed set as parameter of fakelake, the output will be the same.
For example : fakelake generate --seed xxxxxx path/to/schema.yaml
If seed is not passed as parameter, a random seed is generated.
After file generation, the used seed is printed.
This feature enables to generate the same dataset in different formats (CSV and PARQUET for example). Also, it easier to share a seed than a full dataset if you want to reproduce something on another environment.
I propose to add a feature to generate the same output several times.
To add deterministic way, I suggest to add a seed and generate all values from this seed. For the same seed set as parameter of fakelake, the output will be the same.
For example :
fakelake generate --seed xxxxxx path/to/schema.yaml
If seed is not passed as parameter, a random seed is generated. After file generation, the used seed is printed.This feature enables to generate the same dataset in different formats (CSV and PARQUET for example). Also, it easier to share a seed than a full dataset if you want to reproduce something on another environment.