The list of popular names is too large

src-d / identity-matching

source{d} extension to match Git signatures to real people.

GNU General Public License v3.0

17 stars 13 forks source link

Currently when running match-identities, we use a pre-compiled list of popular names.

This list is very large: 55659 names and it includes names that are obviously not popular: emanuele caprioli, ludovic menthiller, thomas flahault, bryce cuthriell, ... etc

It looks like hyperopt has been running on a huge dataset that is not representative to the real use case, whereas the design document says that the use case of identity-matching should be one organization with less than 10k devs and repos. Thus, it looks like we have to lower the threshold and recompile the list.

src-d / identity-matching

The list of popular names is too large #57