Make "Other..." answers grouping be case insensitive

jeroenheijmans / advent-of-code-surveys

Advent of Code (unofficial) Surveys

Other

41 stars 2 forks source link

Make "Other..." answers grouping be case insensitive #15

Closed tspier closed 1 year ago

tspier commented 2 years ago

Great work so far! I haven't looked through with a fine-combed tooth, but I was curious why Fortran and FORTRAN are treated as separate languages in the results?

jeroenheijmans commented 2 years ago

A fair observation. Similar problems exist for minor spelling deviations (e.g. "pony" versus "Ponylang").

In 2018 I underestimated how much value (and how many entries) the "Other..." option would give. So because of historical reasons answers were not case sensitive, and I just did manual work to deduplicate things a bit and then relied on PowerBI for fixing case sensitivity.

With the move to a dashboard on the web, the case sensitiveness came back to haunt me.

This should be easy to fix (for all questions) by updating the matching of answers and re-running the scripts. I'll definitely make sure I do this before 2022.

Thanks for the suggestion!

jeroenheijmans commented 1 year ago

Notes to self...

Was trying to do some quick fixes for this, but it's actually not really easy.

If it's done in the sources, I need to do it for every year, and change the data structure entirely, creating a case-insensitive key. If it's done in the dashboard app, it needs to be done cross-year in the multiAnswerReducer and singleAnswerReducer, which is not entirely trivial.

My thinking currently (given my limited amount of time) is to just manually find more edge cases and fix them in the mappings by listing them.

The manual solution also solves the question of how to pick the "right" version when unifying options.

🤔

jeroenheijmans commented 1 year ago

I fixed the case from OP in the 2022 edition (also for older data) manually. Otherwise an automated solution is indeed actually tricky, so I'll just stick with the manual labor. Only once per year anyways.

When I release the 2022 data all this stuff will be in there!