kungfuai / kaishi

Tool kit to accelerate exploratory data analysis and data cleaning
https://kaishi.readthedocs.io/en/latest/
MIT License
11 stars 2 forks source link

support tabular data with csv files #4

Closed zzsi closed 4 years ago

zzsi commented 4 years ago

Given a folder of csv or csv.gz files with a shared schema, output:

No ml model training is involved.

@mwharton3 does this look like what you want for tabular data?

zzsi commented 4 years ago

I can work on it, though I don't have permission to assign issues to myself

mwharton3 commented 4 years ago

I just gave everyone write access, you should be able to work on it now.

And yes, totally. I wonder if we just extend functionality of a pandas data frame? I’ve found the df.describe() method to be really useful in particular (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).

@jerryschirmer was talking about something like this yesterday, I wonder if he might have any other ideas to overcome pandas limitations.

mwharton3 commented 4 years ago

Include save feature as well