soma-smart / Fakelake

Generate massive fake datasets for your datalake, fast. By SOMA
https://soma-smart.github.io/Fakelake/
MIT License
17 stars 1 forks source link

Add length option for random string provider #31

Closed bhagenbourger closed 6 months ago

bhagenbourger commented 6 months ago

18

bhagenbourger commented 6 months ago

I added the "length" option that accepts constant and range. Range must be specified like that : "3..20" Constant must be specified like that : "8"

I'm not completely satisfied by this solution because the constant must be between double quotes. If an integer is specified the function column["length"].as_str() returns None.

@vianneybacoup do you know a better solution to be able to specify an integer or a string for the constant length?

Thank you for your help.

vianneybacoup commented 6 months ago

Thanks for the range idea @bhagenbourger, it would be a great fit for the random integer min/max as well !

An easy fix would be to cast the integer to str before the .map(), with a match to handle both types. I am not a fan of this solution but it's still better than having to put the quotes in the config file. Having a dedicated logic for parameters (IntegerParam, RangeParam, DateParam, etc...) to avoid duplicating the logic would be a great enhancement and may be a clean fix for this "issue". But that's an extra feature/refactoring, I'll might do that in the next weeks.

Also the quotes for "5..15" in the yaml file are not necessary :)

bhagenbourger commented 6 months ago

Thank you @vianneybacoup for your help, with pattern matching I was able to solve my issue. Length option can now be specified into the yaml without double quotes. Agree with you about "Having a dedicated logic for parameters", I let this point for a specific issue. I also modified tests/tests.rs to manage unix OS. I was able to test only on macos, I hope I didn't break anything on Windows.