src-d / identity-matching

source{d} extension to match Git signatures to real people.
GNU General Public License v3.0
17 stars 13 forks source link

Detect the primary name of an identified person #49

Closed vmarkovtsev closed 5 years ago

vmarkovtsev commented 5 years ago

As https://github.com/src-d/eee-identity-matching/issues/28 suggests, we need to count commits per name for each identity with more than one name and perform the detection.

We have to generate the second Parquet file:

Please note: the primary name should not be lower-cased. The easiest thing to do is to capitalize the first letter in each word.

vmarkovtsev commented 5 years ago

Assigning myself because @Guillemdb is busy with the commit time series, @irinakhismatullina is on the demo and @r0mainK is doing his school report.