Closed tspier closed 1 year ago
A fair observation. Similar problems exist for minor spelling deviations (e.g. "pony" versus "Ponylang").
In 2018 I underestimated how much value (and how many entries) the "Other..." option would give. So because of historical reasons answers were not case sensitive, and I just did manual work to deduplicate things a bit and then relied on PowerBI for fixing case sensitivity.
With the move to a dashboard on the web, the case sensitiveness came back to haunt me.
This should be easy to fix (for all questions) by updating the matching of answers and re-running the scripts. I'll definitely make sure I do this before 2022.
Thanks for the suggestion!
Notes to self...
Was trying to do some quick fixes for this, but it's actually not really easy.
If it's done in the sources, I need to do it for every year, and change the data structure entirely, creating a case-insensitive key. If it's done in the dashboard app, it needs to be done cross-year in the multiAnswerReducer
and singleAnswerReducer
, which is not entirely trivial.
My thinking currently (given my limited amount of time) is to just manually find more edge cases and fix them in the mappings by listing them.
The manual solution also solves the question of how to pick the "right" version when unifying options.
🤔
I fixed the case from OP in the 2022 edition (also for older data) manually. Otherwise an automated solution is indeed actually tricky, so I'll just stick with the manual labor. Only once per year anyways.
When I release the 2022 data all this stuff will be in there!
Great work so far! I haven't looked through with a fine-combed tooth, but I was curious why Fortran and FORTRAN are treated as separate languages in the results?