Closed neomatrix369 closed 3 years ago
Hi @neomatrix369, thanks for that input. Just saw that dataiku 7 is out with some interesting EDA features. We'll have a look at those too. Best, Tobias
@tkrabel Look forward to the features, these are highly desirable, coupled with few of the others I mentioned during our chat.
Google Data Studio has some new EDA features too, check them out.
@neomatrix369 will do!
@neomatrix369 will do!
Keep me posted while I get this thing to work on my machine. It'll be useful features.
Take a look at Dataiku version 7, it has a lot of visualisations goodies ;)
We are currently working on our EDA capabilities. With our last release, we set some architectural foundation by introducing the tab system. With that system, we become more flexible in the future. With the coming releases, you will see more viz goodies :)
Just to let you know your bamboolib
installations mutates the installed pandas
which may well be needed by you but then it breaks other functionalities of pandas. I have had this twice and then I had to re-install pandas
when I get the issue.
Where would you like to raise this as an issue to investigate?
We are currently working on our EDA capabilities. With our last release, we set some architectural foundation by introducing the tab system. With that system, we become more flexible in the future. With the coming releases, you will see more viz goodies :)
Look forward to it! Sounds fun and interesting. The new UI and outcome is pretty cool.
Description of Issue
It would be great to be able to see both the stats/graphs/charts of the dataset (or columns of the dataset) and also get recommendations/suggestions on what could be done to the dataset or the columns depending on the nature of the values in them - for eg. tools like Dataiku provide this even though we can take advantage of this manually only.
Have options like in Dataiku to generate a whole dataset summary stats and also per column (either by specific column name(s) or all of them)
And then also provide suggestions per column, on how to transform them - and also suggest the same for the dataset as a whole.
This might be an iterative process after applying each suggestion or the suggestions could be applied as a batch.
What steps have you taken to resolve this already?
Currently, I use other tools like Dataiku and the likes to get my analysis report(s).
Alternatively, we can do this: Build up the stats piece by piece with existing functionality of pandas-profiling - maybe the stats and graphing parts produced just like Dataiku but suggestions and applying the suggestions on the columns or the whole dataset would need some thinking and writing of new functions/features.
Anything else
Screenshot of whole dataset stats: - I find the stats quite comprehensive and gives good room to create a suggestion/recommendation builder
Screenshot of per column stats (just one column: id):
Screenshot of per column stats (just one column: funder):
Screenshot of per column stats (just one column: funder):
In Dataiku, depending on the data type of the column selected, the descriptive stats and the options along with are specific to the context the user is in.