soma-smart / Fakelake

Generate massive fake datasets for your datalake, fast. By SOMA
https://soma-smart.github.io/Fakelake/
MIT License
17 stars 1 forks source link

[Feature] Add option to split output into multiple files #44

Open hugues31 opened 5 months ago

hugues31 commented 5 months ago

Add one or multiple options to allow user specify a strategy to split the dataset among multiple files.

It could be great for example to have :

info:
    output_name: test
    output_format: parquet
    rows: 2_000_000
    files: 5

So each file will contains approx. 2M/5 = 400k rows.

We could have parameters like: