Closed warenlg closed 5 years ago
With a new threshold, the list of popular names decreased to 1025 samples. We don't have those full names included anymore.
The length of the identity tables also decreased from 5% to 60% depending on the organization, making it more readable.
Currently when running
match-identities
, we use a pre-compiled list of popular names.This list is very large:
55659 names
and it includes names that are obviously not popular:emanuele caprioli
,ludovic menthiller
,thomas flahault
,bryce cuthriell
, ... etcIt looks like hyperopt has been running on a huge dataset that is not representative to the real use case, whereas the design document says that the use case of
identity-matching
should be one organization with less than 10k devs and repos. Thus, it looks like we have to lower the threshold and recompile the list.