sfu-db / dataprep

Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.
http://dataprep.ai
MIT License
2.08k stars 206 forks source link

Seperate out clean so it doesn't rely on pandas/dask #969

Open meyerovb opened 1 year ago

meyerovb commented 1 year ago

Banging my head for hours cause AWS Lambda has a 250mb code limit and all I wanted to do was clean email addresses. I'd have to build a damn docker image to run 10 lines of code. So instead I'm ripping out _format_email from clean_emails.py to just run it against my emails. It would be great if the actual CLEANING code was separated out into different files that don't need to import all the other nonsense that has nothing to do with CLEANING DATA.