unpackAI / unpackai

The Unpack.AI library
https://www.unpackai.com
MIT License
20 stars 4 forks source link

⛈ correlation table for tabular #16

Closed raynardj closed 3 years ago

raynardj commented 3 years ago

@raynardj Could we add a few basic EDA methods for dataset exploration:

  • [ ] correlation matrix
  • [ ] histograms

In order to see:

  1. the relationship between features
  2. explore if the dataset is balanced/imbalanced for classification problems

I can't see the correlation thing in the notebook except the dandogram, is that it?

It's not in the notebook yet. In the previous DL 101 one students built the correlation matrix to get better insights about data. It helped him to understand dendrogram better to remove redundant features. So when we had a discussion with mentors and John, the conclusion was to add it for better data interpretabilityty

Originally posted by @faizer1989 in https://github.com/unpackAI/unpackai/issues/14#issuecomment-923699782

vtecftwy commented 3 years ago

I plan to write the code for the following in a module called eda:

Any other idea, let me know

raynardj commented 3 years ago

I plan to write the code for the following in a module called eda:

  • get correlation matrix as a MN array or tensor
  • plot correlation matrix as a heatmap with option to have the triangle matrix only or only those values bigger than a certain % (in abs val)
  • add the dendogram function

Any other idea, let me know

vtecftwy commented 3 years ago

That works. Will add it in tabular_data module.

vtecftwy commented 3 years ago

Added the code with 8385e42 (forgot to link commit with the issue number). Still have to add come additional test to improve coverage