tkrabel / bamboolib

bamboolib - a GUI for pandas DataFrames
https://bamboolib.com
939 stars 94 forks source link

[Enhancement] Provide analysis/reports like how Dataiku provides #16

Closed neomatrix369 closed 3 years ago

neomatrix369 commented 4 years ago

Description of Issue

It would be great to be able to see both the stats/graphs/charts of the dataset (or columns of the dataset) and also get recommendations/suggestions on what could be done to the dataset or the columns depending on the nature of the values in them - for eg. tools like Dataiku provide this even though we can take advantage of this manually only.

Have options like in Dataiku to generate a whole dataset summary stats and also per column (either by specific column name(s) or all of them)

And then also provide suggestions per column, on how to transform them - and also suggest the same for the dataset as a whole.

This might be an iterative process after applying each suggestion or the suggestions could be applied as a batch.

What steps have you taken to resolve this already?

Currently, I use other tools like Dataiku and the likes to get my analysis report(s).

Alternatively, we can do this: Build up the stats piece by piece with existing functionality of pandas-profiling - maybe the stats and graphing parts produced just like Dataiku but suggestions and applying the suggestions on the columns or the whole dataset would need some thinking and writing of new functions/features.

Anything else

Screenshot of whole dataset stats: Screen Shot 2019-10-18 at 01 53 47 - I find the stats quite comprehensive and gives good room to create a suggestion/recommendation builder

Screenshot of per column stats (just one column: id): Screen Shot 2019-10-18 at 01 54 07

Screenshot of per column stats (just one column: funder): Screen Shot 2019-10-18 at 02 02 57

Screenshot of per column stats (just one column: funder): Screen Shot 2019-10-18 at 02 05 04

In Dataiku, depending on the data type of the column selected, the descriptive stats and the options along with are specific to the context the user is in.

8080labs commented 4 years ago

Hi @neomatrix369, thanks for that input. Just saw that dataiku 7 is out with some interesting EDA features. We'll have a look at those too. Best, Tobias

neomatrix369 commented 4 years ago

@tkrabel Look forward to the features, these are highly desirable, coupled with few of the others I mentioned during our chat.

Google Data Studio has some new EDA features too, check them out.

8080labs commented 4 years ago

@neomatrix369 will do!

neomatrix369 commented 4 years ago

@neomatrix369 will do!

Keep me posted while I get this thing to work on my machine. It'll be useful features.

neomatrix369 commented 4 years ago

Take a look at Dataiku version 7, it has a lot of visualisations goodies ;)

8080labs commented 4 years ago

We are currently working on our EDA capabilities. With our last release, we set some architectural foundation by introducing the tab system. With that system, we become more flexible in the future. With the coming releases, you will see more viz goodies :)

neomatrix369 commented 4 years ago

Just to let you know your bamboolib installations mutates the installed pandas which may well be needed by you but then it breaks other functionalities of pandas. I have had this twice and then I had to re-install pandas when I get the issue.

Where would you like to raise this as an issue to investigate?

neomatrix369 commented 4 years ago

We are currently working on our EDA capabilities. With our last release, we set some architectural foundation by introducing the tab system. With that system, we become more flexible in the future. With the coming releases, you will see more viz goodies :)

Look forward to it! Sounds fun and interesting. The new UI and outcome is pretty cool.