Open iaindillingham opened 2 years ago
Thanks @iaindillingham, would you be able to implement something like the 'top_counts' column in Will's example? It summarizes the counts for a maximum of 4 categories. This would be really helpful to see if there have been any major mistakes.
If that isn't possible then, as Will stated, the number of unique values is still a useful insight.
Not sure if this was clear before, but for categoricals a table of counts, a la cohort-report
, is often still really useful.
So both a single-row-per-variable format to have an overview of the entire dataset (split by variable type) and count tabulation for relevant categorical variables would be useful. Could also simplify things by just tabulating all variables with fewer than ~20 unique values, to avoid eg STPs or MSOAs being tabulated and to ensure categorical-as-int variables are still included. These tables would live in a separate document.
Obv with redaction!
From @andrewscolm. Thanks, Colm 🙂
If we implemented #22, then we would struggle to summarize counts for each category, as some categorical columns would have more categories than other categorical columns. However, we could summarize the number of unique values and the number of missing values. As @wjchulme says about the number of unique values:
Do we also need to summarize counts for each category?