Open FrancescoManfredi opened 4 months ago
Hi @FrancescoManfredi, first of all thanks for your input and the blog post! Super fascinating. I agree this an issue that can be fixed relatively easily. We'll probably convert your mapping to integrate it into our golang validator/generator so we stick to a single language.
I personally don't have much time these days to tackle the issue but if no one picks it up by mid of May I'll try and tackle it myself.
A high number of tags refer to the same concept with different wording or different casing/styling for the same words.
It might be a good idea to add a normalization pipeline for the tags in each company.
Here is a mapping from original to normalized tags in the form of a python dict (easily convertible in any other format) that might be useful as a starting point: https://github.com/FrancescoManfredi/AIRV-analysis/blob/main/tags_repl.py
I'm the author of that mapping and this is an invite to make use of it in any way you prefer.