lux-org / lux

Automatically visualize your pandas dataframe via a single print! 📊 💡
Apache License 2.0
5.21k stars 370 forks source link

Extend support for ordinal data type #240

Open dorisjlee opened 3 years ago

dorisjlee commented 3 years ago

Ordinal data are common in rating scales for surveys, as well as attributes like Age or number of years for X. Ordinal data currently gets classified as categorical, especially if the column contains NaN values. The young people survey dataset on Kaggle is a good example of this, since it contains lots of rating scale data. image This issue should extend support for ordinal data type detection, as well as better visualizations to display for ordinal data type. For example, ordinal data bar charts should be ordered instead of sorted based on the measure values. In addition, correlation of one or more ordinal attribute would be relevant to show.

dorisjlee commented 3 years ago

The absenteeism dataset actually has a couple very interesting columns (e.g., Body mass index, Height) that are quantitative but due to the integer nature and low~medium cardinality, it is detected as nominal. I'm wondering if this would actually be a good use case for the ordinal data type as some intermediate in between. In particular, I feel that nominal is especially inappropriate since we would ideally want a scatterplot for something like BMI and not have these columns be part of Filters with equalities.

df = pd.read_csv("../lux-datasets/data/absenteeism.csv")
df.intent = ["Weight"]
df

image.png