Open dorisjlee opened 3 years ago
The absenteeism dataset actually has a couple very interesting columns (e.g., Body mass index, Height) that are quantitative but due to the integer nature and low~medium cardinality, it is detected as nominal. I'm wondering if this would actually be a good use case for the ordinal data type as some intermediate in between. In particular, I feel that nominal is especially inappropriate since we would ideally want a scatterplot for something like BMI and not have these columns be part of Filters with equalities.
df = pd.read_csv("../lux-datasets/data/absenteeism.csv")
df.intent = ["Weight"]
df
Ordinal data are common in rating scales for surveys, as well as attributes like Age or number of years for X. Ordinal data currently gets classified as categorical, especially if the column contains NaN values. The young people survey dataset on Kaggle is a good example of this, since it contains lots of rating scale data. This issue should extend support for ordinal data type detection, as well as better visualizations to display for ordinal data type. For example, ordinal data bar charts should be ordered instead of sorted based on the measure values. In addition, correlation of one or more ordinal attribute would be relevant to show.