soma-smart / Fakelake

Generate massive fake datasets for your datalake, fast. By SOMA
https://soma-smart.github.io/Fakelake/
MIT License
17 stars 1 forks source link

Add corrupted option #40

Closed vianneybacoup closed 5 months ago

vianneybacoup commented 5 months ago

33

Specifying corrupted as a parameter for any provider will change a percentage of the content with a different behavior. It is still keeping the same type for now (because of parquet, this needs to change in the future)

The behavior depends on the provider's type:

vianneybacoup commented 5 months ago

@bhagenbourger can this work for your need ? Feel free to criticize

bhagenbourger commented 5 months ago

@bhagenbourger can this work for your need ? Feel free to criticize

Yes it's good, as parquet file has a schema we can only have semantic errors. For CSV or JSON we could go further in "corrupted data" with wrong types or bad pattern for dates.

vianneybacoup commented 5 months ago

@bhagenbourger can this work for your need ? Feel free to criticize

Yes it's good, as parquet file has a schema we can only have semantic errors. For CSV or JSON we could go further in "corrupted data" with wrong types or bad pattern for dates.

That's actually what I had in mind. But it needs a rework of the config so that I can return a differente Value type for other than parquet.

That will be a second step, let's merge that for now @hugues31