Open So-Cool opened 8 years ago
How often does this occur? If there are not too many cases, such mappings can be added manually.
Not too often in the samples that I have to be honest. Nevertheless, as there is quite a number of possible combinations this could be quite useful in general. Let's see what happens with labels when we're at the stage of clustering.
Right now the first mapping which is the longest string matched is used. To improve labelling all possible matches need to be considered and the most probable abbreviation combination i.e. the one that uses all of the sub-strings should be chosen. For example "adload" right now will be split into "a" and "dload" with the latter mapped to downloader. A better split would be "ad" (adware) and "load" (downloader).